Abstract
In a streaming environment, the characteristics of the data themselves and their relationship with the labels may change over time. Most drift detection methods for supervised data streams are performance-based, that is, they detect changes only after the classification accuracy deteriorates. This may not be sufficient in many application areas where the reason behind a drift is also important. Another category of drift detectors are data distribution-based detectors. Although they can detect some drifts within the input space, changes affecting only the labelling mechanism cannot be identified. Furthermore, little work is available on drift detection for high-dimensional data streams. In this paper we propose an advanced Hierarchical Reduced-space Drift Detection (HRDD) framework for supervised data streams which captures drifts regardless of their effects on classification performance. This framework suggests monitoring both marginal and class-conditional distributions within a lower-dimensional space specifically relevant to the assigned classification task. Experimental comparisons have demonstrated that HRDD not only achieves high-quality performance on high-dimensional data streams, but also outperforms its competitors in terms of detection recall, precision and F-measure across a wide range of different concept drift types including subtle drifts. © 1989-2012 IEEE.
Original language | English |
---|---|
Pages (from-to) | 2628-2640 |
Number of pages | 13 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2021 |
Externally published | Yes |
Keywords
- Concept drift
- data stream mining
- drift detection
- online learning