Abstract
Mainstream research in concept drift detection for on-line classification focuses on monitoring a measure of the learner's performance, thus assuming that a reduction of its prediction ability implies a change in the relation between the input variables and the class labels. This approach makes the detector highly dependent on the learner, which can be problematic in some situations, for example, when the learner includes adaptability mechanisms, when it is unable to converge, or when it shows a high overfitting. Ultimately, the concept drift is something that happens in the data, not learners, so detecting drifts indirectly through a learner's performance adds bias into the result. Besides, it makes the process highly inefficient. This paper proposes a new mechanism to detect concept drifts without supervising the learning process. The data distribution is summarized in univariate histograms based on which an on-line straightforward prediction is made. It proves us a measure that is monitored over time for detecting changes in data. The method is extremely efficient, as the time and space complexity to process each sample is linear to the number of input variables multiplied by the number of classes. A thorough analysis on synthetic and real-world data streams shows that the proposed method very efficiently makes a reliable (low false alarm ratio) and effective (high true detection ratio) drift detection that outperforms other well-known methods.
Original language | English |
---|---|
Title of host publication | Proceedings : 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 |
Editors | Hanghang TONG, Zhenhui Jessie LI, Feida ZHU, Jeffrey YU |
Publisher | IEEE Computer Society |
Pages | 878-885 |
Number of pages | 8 |
ISBN (Print) | 9781538692882 |
DOIs | |
Publication status | Published - Nov 2018 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2018 IEEE.