Fusion of Multi-RSMOTE with Fuzzy Integral to Classify Bug Reports with an Imbalanced Distribution

Rong CHEN, Shi-Kai GUO, Xi-Zhao WANG*, Tian-Lun ZHANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

73 Citations (Scopus)

Abstract

With the help of automated classification, severe bugs can be rapidly identified so that the latent damage to software projects can be minimized. However, bug report datasets commonly suffer from disproportionate number of category samples. When presented with the situation of class imbalance, most standard classification learning approaches fail to properly learn the distributive characteristics of the samples and tend to result in unfavorable performance to predict class label. In this case, imbalanced learning becomes critical to advance classification algorithms. In this paper, we propose an improved synthetic minority oversampling technique to avoid the degraded performance caused by class imbalance in bug report datasets. Moreover, to lessen the chance of occasionalities in random sampling process, we propose a repeated sampling technique to train different, but related classifiers. Finally, an ensemble algorithm based on Choquet fuzzy integral is employed to combine the wisdom of crowds and make better decisions. We conduct comprehensive experiments on several bug report datasets from real-world bug repositories. The results demonstrate that the proposed method boosts the classification performance across the classes of the data. Specifically, compared with various ensemble learning techniques, the Choquet fuzzy integral achieves outstanding results on integrating multiple random oversampling techniques.

Original languageEnglish
Article number8642848
Pages (from-to)2406-2420
Number of pages15
JournalIEEE Transactions on Fuzzy Systems
Volume27
Issue number12
Early online date15 Feb 2019
DOIs
Publication statusPublished - Dec 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1993-2012 IEEE.

Keywords

  • Bug report identification
  • class imbalance
  • fuzzy integral
  • software quality

Fingerprint

Dive into the research topics of 'Fusion of Multi-RSMOTE with Fuzzy Integral to Classify Bug Reports with an Imbalanced Distribution'. Together they form a unique fingerprint.

Cite this