A new samples selecting method based on K nearest neighbors

Yang KAI, Yi CAI, Zhiwei CAI, Xingwei TAN, Haoran XIE, Tak Lam WONG, Wai Hong CHAN

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

1 Citation (Scopus)

Abstract

Short text classification uses a supervised learning process, and it needs a huge amount of labeled data for training. This process consumes a lot of human resources. In traditional supervised learning problems, active learning can reduce the amount of samples that need to be labeled manually. It achieves this goal by selecting the most representative samples to represent the whole training set. Uncertainty sampling is the most popular way in active learning, but it has poor performance when it is affected by outliers. In our paper, we propose a new sampling method for training sets containing short text, which is denoted as Top-K Representative (TKR). However, the optimization process of TKR is a N-P hard problem. To solve this problem, a new algorithm, based on the greedy algorithm, is proposed to obtain the approximating results. The experiments show that our proposed sampling method performs better than the state-of-the-art methods.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Big Data and Smart Computing Proceedings
Place of PublicationKorea
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages457-462
Number of pages6
ISBN (Electronic)9781509030156
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event2017 IEEE International Conference on Big Data and Smart Computing - MAISON GLAD JEJU Hotel, Jeju Island, Korea, Republic of
Duration: 13 Feb 201716 Feb 2017
http://www.bigcomputing.org/conf2017/

Publication series

Name2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017

Conference

Conference2017 IEEE International Conference on Big Data and Smart Computing
Abbreviated titleBigComp 2017
Country/TerritoryKorea, Republic of
CityJeju Island
Period13/02/1716/02/17
Internet address

Funding

This work is supported by National Natural Science Foundation of China (project no. 61300137), Science and Technology Planning Project of Guangdong Province, China (No.2013B010406004), Tip-top Scientific and Technical Innovative Youth Talents of Guangdong special support program(No. 2015TQ01X633) and Science and Technology Planning Major Project of Guangdong Province (No. 2015A070711001).

Fingerprint

Dive into the research topics of 'A new samples selecting method based on K nearest neighbors'. Together they form a unique fingerprint.

Cite this