Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems

Wing W. Y. NG, Shichao XU, Jianjun ZHANG, Xing TIAN, Tongwen RONG, Sam KWONG

Research output: Journal PublicationsJournal Article (refereed)peer-review

38 Citations (Scopus)

Abstract

Undersampling is a popular method to solve imbalanced classification problems. However, sometimes it may remove too many majority samples which may lead to loss of informative samples. In this article, the hashing-based undersampling ensemble (HUE) is proposed to deal with this problem by constructing diversified training subspaces for undersampling. Samples in the majority class are divided into many subspaces by a hashing method. Each subspace corresponds to a training subset which consists of most of the samples from this subspace and a few samples from surrounding subspaces. These training subsets are used to train an ensemble of classification and regression tree classifiers with all minority class samples. The proposed method is tested on 25 UCI datasets against state-of-the-art methods. Experimental results show that the HUE outperforms other methods and yields good results on highly imbalanced datasets.
Original languageEnglish
Pages (from-to)1269-1279
Number of pages11
JournalIEEE Transactions on Cybernetics
Volume52
Issue number2
Early online date29 Jun 2020
DOIs
Publication statusPublished - Feb 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61876066, Grant 61572201, and Grant 61672443; in part by the Guangdong Province Science and Technology Plan Project (Collaborative Innovation and Platform Environment Construction) under Grant 2019A050510006; and in part by the Hong Kong RGC General Research Funds under Grant 9042038 (CityU 11205314) and Grant 9042322 (CityU 11200116).

Keywords

  • Bagging
  • hashing
  • imbalanced classification problems
  • undersampling

Fingerprint

Dive into the research topics of 'Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems'. Together they form a unique fingerprint.

Cite this