Knowledge-informed Sparse Learning for Relevant Feature Selection and Optimal Quality Prediction

Yiren LIU, S. Joe QIN

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Industrial data are usually collinear, which can cause pure data-driven sparse learning to deselect physically relevant variables and select collinear surrogates. In this paper, a novel two-step learning approach to retaining knowledge-informed variables (KIV) is proposed to build inferential models. The first step is an improved knowledge-informed Lasso (KILasso) algorithm by removing penalty on the KIVs to produce a series of candidate subsets that guarantee the retention of the KIVs. The candidate subsets are then used to run the KILasso or ridge regression again to select the best sets of variables and estimate the final model. Two new algorithms are proposed and applied to datasets from an industrial boiler process and the Dow Chemical challenge problem. It is demonstrated that some important physically-relevant variables are deselected by pure data-driven sparse methods, but they are retained using the proposed knowledge-informed methods with superior prediction performance.
Original languageEnglish
Number of pages9
JournalIEEE Transactions on Industrial Informatics
DOIs
Publication statusE-pub ahead of print - 22 Feb 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Input variables
  • Learning systems
  • Prediction algorithms
  • Predictive models
  • Sensors
  • Sparse learning
  • Training
  • Training data
  • industrial applications
  • online trend adaption
  • physically relevant variables
  • variable selection

Fingerprint

Dive into the research topics of 'Knowledge-informed Sparse Learning for Relevant Feature Selection and Optimal Quality Prediction'. Together they form a unique fingerprint.

Cite this