Integration of process knowledge and statistical learning for the Dow data challenge problem

S. Joe QIN*, Siyi GUO, Zheyu LI, Leo H. CHIANG, Ivan CASTILLO, Birgit BRAUN, Zhenyu WANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

21 Citations (Scopus)


In this paper, we propose a statistical learning procedure that integrates process knowledge for the Dow data challenge problem presented in Braun et al. (2020). The task is to build an accurate inferential sensor model to predict the impurity in the product stream with apparent drifts. The proposed method consists of i) process data exploratory analysis, ii) a method for variable selection, iii) a method to deal with non-negative physical property modeling using a softplus function; and iv) a method for online bias updating based on known data. We make use of process operation knowledge in all steps of data analytics, including exploratory analysis and feature selection. We report the detection of equipment-switching operations in the data and interpolations found in the impurity data. Partial least squares (PLS) and least angle regression solution (LARS) are adopted to model the data with strong collinearity. Pros and cons of LARS and PLS are given with practical implications.
Original languageEnglish
Article number107451
JournalComputers and Chemical Engineering
Publication statusPublished - Oct 2021
Externally publishedYes

Bibliographical note

The first author acknowledges the financial support for this work from the City University of Hong Kong under Project 9380123 : Bridging between Systems Theory and Dynamic Data Learning towards Industrial Intelligence and Industry 4.0 and the NSF China Grant U20A201398, Big data-driven abnormal situation intelligent diagnosis and self-healing control for process industries.


  • Least angle regression
  • Partial least squares
  • Process knowledge
  • Statistical machine learning
  • Variable selection


Dive into the research topics of 'Integration of process knowledge and statistical learning for the Dow data challenge problem'. Together they form a unique fingerprint.

Cite this