Abstract
With the help of automated classification, severe bugs can be rapidly identified so that the latent damage to software projects can be minimized. However, bug report datasets commonly suffer from disproportionate number of category samples. When presented with the situation of class imbalance, most standard classification learning approaches fail to properly learn the distributive characteristics of the samples and tend to result in unfavorable performance to predict class label. In this case, imbalanced learning becomes critical to advance classification algorithms. In this paper, we propose an improved synthetic minority oversampling technique to avoid the degraded performance caused by class imbalance in bug report datasets. Moreover, to lessen the chance of occasionalities in random sampling process, we propose a repeated sampling technique to train different, but related classifiers. Finally, an ensemble algorithm based on Choquet fuzzy integral is employed to combine the wisdom of crowds and make better decisions. We conduct comprehensive experiments on several bug report datasets from real-world bug repositories. The results demonstrate that the proposed method boosts the classification performance across the classes of the data. Specifically, compared with various ensemble learning techniques, the Choquet fuzzy integral achieves outstanding results on integrating multiple random oversampling techniques.
Original language | English |
---|---|
Article number | 8642848 |
Pages (from-to) | 2406-2420 |
Number of pages | 15 |
Journal | IEEE Transactions on Fuzzy Systems |
Volume | 27 |
Issue number | 12 |
Early online date | 15 Feb 2019 |
DOIs | |
Publication status | Published - Dec 2019 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 1993-2012 IEEE.
Keywords
- Bug report identification
- class imbalance
- fuzzy integral
- software quality