Relationships between diversity of classification ensembles and single-class performance measures

Shuo WANG, Xin YAO

Research output: Journal PublicationsJournal Article (refereed)peer-review

106 Citations (Scopus)

Abstract

In class imbalance learning problems, how to better recognize examples from the minority class is the key focus, since it is usually more important and expensive than the majority class. Quite a few ensemble solutions have been proposed in the literature with varying degrees of success. It is generally believed that diversity in an ensemble could help to improve the performance of class imbalance learning. However, no study has actually investigated diversity in depth in terms of its definitions and effects in the context of class imbalance learning. It is unclear whether diversity will have a similar or different impact on the performance of minority and majority classes. In this paper, we aim to gain a deeper understanding of if and when ensemble diversity has a positive impact on the classification of imbalanced data sets. First, we explain when and why diversity measured by Q-statistic can bring improved overall accuracy based on two classification patterns proposed by Kuncheva et al. We define and give insights into good and bad patterns in imbalanced scenarios. Then, the pattern analysis is extended to single-class performance measures, including recall, precision, and F-measure, which are widely used in class imbalance learning. Six different situations of diversity's impact on these measures are obtained through theoretical analysis. Finally, to further understand how diversity affects the single class performance and overall performance in class imbalance problems, we carry out extensive experimental studies on both artificial data sets and real-world benchmarks with highly skewed class distributions. We find strong correlations between diversity and discussed performance measures. Diversity shows a positive impact on the minority class in general. It is also beneficial to the overall performance in terms of AUC and G-mean. © 1989-2012 IEEE.
Original languageEnglish
Article number6035708
Pages (from-to)206-219
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume25
Issue number1
Early online date11 Oct 2011
DOIs
Publication statusPublished - Jan 2013
Externally publishedYes

Keywords

  • Class imbalance learning
  • data mining
  • diversity
  • ensemble learning
  • single-class performance measures

Fingerprint

Dive into the research topics of 'Relationships between diversity of classification ensembles and single-class performance measures'. Together they form a unique fingerprint.

Cite this