An approach to sample selection from big data for classification

Sheng XING, Yulin HE, Hong ZHU, Xizhao WANG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

1 Citation (Scopus)

Abstract

When traditional sample selection methods are used to compress large data sets, the computational complexity turns out to be very high and it is really time consuming. To avoid these shortcomings, we propose a new method to select samples based on non-stable cut points. With the basic characteristic of convex function that its extreme values occur at the endpoints of intervals, the method measures the extent of a sample being endpoints by labeling non-stable cut points. Then we can select the samples with higher endpoint extent, which can avoid calculating the distances between samples. This method aims to compress the data sets and improve the computational efficiency without affecting the classification accuracy. Experiments show that the proposed algorithm performs very well on the compression of data sets with higher imbalance degree. Meanwhile, the method is experimentally confirmed to have strong noise-resistance.

Original languageEnglish
Title of host publicationProceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 : Conference Proceedings
PublisherIEEE
Pages2928-2935
Number of pages8
ISBN (Electronic)9781509018970
ISBN (Print)9781509018987
DOIs
Publication statusPublished - 6 Feb 2017
Externally publishedYes
Event2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Budapest, Hungary
Duration: 9 Oct 201612 Oct 2016

Conference

Conference2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016
Country/TerritoryHungary
CityBudapest
Period9/10/1612/10/16

Keywords

  • Big data classification
  • Decision tree
  • Non-stable cut points
  • Sample selection

Fingerprint

Dive into the research topics of 'An approach to sample selection from big data for classification'. Together they form a unique fingerprint.

Cite this