Abstract
Class imbalance problems have drawn growing interest recently because of their classification difficulty caused by the imbalanced class distributions. In particular, many ensemble methods have been proposed to deal with such imbalance. However, most efforts so far are only focused on two-class imbalance problems. There are unsolved issues in multiclass imbalance problems, which exist in real-world applications. This paper studies the challenges posed by the multiclass imbalance problems and investigates the generalization ability of some ensemble solutions, including our recently proposed algorithm AdaBoost.NC, with the aim of handling multiclass and imbalance effectively and directly. We first study the impact of multiminority and multimajority on the performance of two basic resampling techniques. They both present strong negative effects. "Multimajority" tends to be more harmful to the generalization performance. Motivated by the results, we then apply AdaBoost.NC to several real-world multiclass imbalance tasks and compare it to other popular ensemble methods. AdaBoost.NC is shown to be better at recognizing minority class examples and balancing the performance among classes in terms of G-mean without using any class decomposition. © 2012 IEEE.
Original language | English |
---|---|
Article number | 6170916 |
Pages (from-to) | 1119-1130 |
Number of pages | 12 |
Journal | IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics |
Volume | 42 |
Issue number | 4 |
Early online date | 19 Mar 2012 |
DOIs | |
Publication status | Published - Aug 2012 |
Externally published | Yes |
Funding
This work was supported by an ORS Award, an EPSRC Grant (No. EP/D052785/1), and a European FP7 Grant (No. 270428). This paper was recommended by Associate Editor N. Chawla.
Keywords
- Boosting
- diversity
- ensemble learning
- multiclass imbalance problems
- negative correlation learning