Abstract
In this paper, we propose a stabilization strategy for lasso to use cross-validation (CV) for structure learning. It is known that cross-validation often prefers very small λ that selects an excessively large number of variables, which is also in a less stable region of λ. In this paper, we propose to reduce the heterogeneity of the model structures during the CV step. We first build a series of models using all data with a grid of λ. Then the models of all CV-folds use a revised lasso objective that penalizes deviations from the model structure using all data. Further, we propose a stable selection criterion that uses CV prediction errors jointly with a stability measure to select the most stable model with near minimum CV errors. The proposed strategy is demonstrated using data from an industrial boiler process to predict NOx emissions.
| Original language | English |
|---|---|
| Pages (from-to) | 228-233 |
| Number of pages | 6 |
| Journal | IFAC-PapersOnLine |
| Volume | 54 |
| Issue number | 7 |
| Early online date | 15 Sept 2021 |
| DOIs | |
| Publication status | Published - 2021 |
| Externally published | Yes |
| Event | 19th IFAC Symposium on System Identification (SYSID 2021) - Padova, Italy Duration: 13 Jul 2021 → 16 Jul 2021 |
Funding
Financial support for this work from the City University of Hong Kong under Project 9380123: Bridging between Systems Theory and Dynamic Data Learning towards Industrial Intelligence and Industry 4.0 and an NSF-China Regional Joint Key Project for Innovations and Development (U20A20189) is gratefully acknowledged.
Keywords
- Inferential sensors
- Stable cross-validation
- Stable lasso
- Statistical machine learning