A noise-detection based AdaBoost algorithm for mislabeled data

Jingjing CAO, Sam KWONG, Ran WANG

Research output: Journal PublicationsJournal Article (refereed)peer-review

106 Citations (Scopus)

Abstract

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets. © 2012 Elsevier Ltd.
Original languageEnglish
Pages (from-to)4451-4465
JournalPattern Recognition
Volume45
Issue number12
DOIs
Publication statusPublished - Dec 2012
Externally publishedYes

Bibliographical note

This work is supported by City University Grant 9610025 and City University Strategic Grant 7002680.

Keywords

  • AdaBoost
  • EM
  • Ensemble learning
  • k-NN
  • Pattern recognition

Fingerprint

Dive into the research topics of 'A noise-detection based AdaBoost algorithm for mislabeled data'. Together they form a unique fingerprint.

Cite this