Abstract
When traditional sample selection methods are used to compress large data sets, the computational complexity turns out to be very high and it is really time consuming. To avoid these shortcomings, we propose a new method to select samples based on non-stable cut points. With the basic characteristic of convex function that its extreme values occur at the endpoints of intervals, the method measures the extent of a sample being endpoints by labeling non-stable cut points. Then we can select the samples with higher endpoint extent, which can avoid calculating the distances between samples. This method aims to compress the data sets and improve the computational efficiency without affecting the classification accuracy. Experiments show that the proposed algorithm performs very well on the compression of data sets with higher imbalance degree. Meanwhile, the method is experimentally confirmed to have strong noise-resistance.
Original language | English |
---|---|
Title of host publication | Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 : Conference Proceedings |
Publisher | IEEE |
Pages | 2928-2935 |
Number of pages | 8 |
ISBN (Electronic) | 9781509018970 |
ISBN (Print) | 9781509018987 |
DOIs | |
Publication status | Published - 6 Feb 2017 |
Externally published | Yes |
Event | 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Budapest, Hungary Duration: 9 Oct 2016 → 12 Oct 2016 |
Conference
Conference | 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 |
---|---|
Country/Territory | Hungary |
City | Budapest |
Period | 9/10/16 → 12/10/16 |
Keywords
- Big data classification
- Decision tree
- Non-stable cut points
- Sample selection