Diversity analysis on imbalanced data sets by using ensemble models

Shuo WANG, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

463 Citations (Scopus)

Abstract

Many real-world applications have problems when learning from imbalanced data sets, such as medical diagnosis, fraud detection, and text classification. Very few minority class instances cannot provide sufficient information and result in performance degrading greatly. As a good way to improve the classification performance of weak learner, some ensemblebased algorithms have been proposed to solve class imbalance problem. However, it is still not clear that how diversity affects classification performance especially on minority classes, since diversity is one influential factor of ensemble. This paper explores the impact of diversity on each class and overall performance. As the other influential factor, accuracy is also discussed because of the trade-off between diversity and accuracy. Firstly, three popular re-sampling methods are combined into our ensemble model and evaluated for diversity analysis, which includes under-sampling, over-sampling, and SMOTE [1] - a data generation algorithm. Secondly, we experiment not only on two-class tasks, but also those with multiple classes. Thirdly, we improve SMOTE in a novel way for solving multi-class data sets in ensemble model - SMOTEBagging. © 2009 IEEE.
Original languageEnglish
Title of host publication2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings
Pages324-331
Number of pages8
DOIs
Publication statusPublished - Mar 2009
Externally publishedYes

Fingerprint

Dive into the research topics of 'Diversity analysis on imbalanced data sets by using ensemble models'. Together they form a unique fingerprint.

Cite this