Knowledge-informed Sparse Learning for Relevant Feature Selection and Optimal Quality Prediction

Yiren LIU, S. Joe QIN

Research output: Journal PublicationsJournal Article (refereed)peer-review

1 Citation (Scopus)


Industrial data are usually collinear, which can cause pure data-driven sparse learning to deselect physically relevant variables and select collinear surrogates. In this article, a novel two-step learning approach to retaining knowledge-informed variables (KIVs) is proposed to build inferential models. The first step is an improved knowledge-informed Lasso (KILasso) algorithm by removing penalty on the KIVs to produce a series of candidate subsets that guarantee the retention of the KIVs. The candidate subsets are then used to run the KILasso or ridge regression again to select the best sets of variables and estimate the final model. Two new algorithms are proposed and applied to datasets from an industrial boiler process and the Dow Chemical challenge problem. It is demonstrated that some important physically relevant variables are deselected by pure data-driven sparse methods, but they are retained using the proposed knowledge-informed methods with superior prediction performance.

Original languageEnglish
Pages (from-to)11499-11507
Number of pages9
JournalIEEE Transactions on Industrial Informatics
Issue number12
Early online date22 Feb 2023
Publication statusPublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2023 IEEE.


  • Industrial applications
  • online trend adaption
  • physically relevant variables
  • sparse learning
  • variable selection


Dive into the research topics of 'Knowledge-informed Sparse Learning for Relevant Feature Selection and Optimal Quality Prediction'. Together they form a unique fingerprint.

Cite this