Abstract
Original language | English |
---|---|
Article number | 112918 |
Journal | Expert Systems with Applications |
Volume | 141 |
Early online date | 2 Sep 2019 |
DOIs | |
Publication status | E-pub ahead of print - 2 Sep 2019 |
Fingerprint
Bibliographical note
This research is supported by LEO Dr David P. Chan Institute of Data Science.Keywords
- Cost-sensitive
- Stacked denoising autoencoders
- Ensemble
- Class imbalance
Cite this
}
Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. / WONG, Man Leung; SENG, Kruy; WONG, Pak Kan.
In: Expert Systems with Applications, Vol. 141, 112918, 01.03.2020.Research output: Journal Publications › Journal Article (refereed)
TY - JOUR
T1 - Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain
AU - WONG, Man Leung
AU - SENG, Kruy
AU - WONG, Pak Kan
N1 - This research is supported by LEO Dr David P. Chan Institute of Data Science.
PY - 2019/9/2
Y1 - 2019/9/2
N2 - Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.
AB - Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.
KW - Cost-sensitive
KW - Stacked denoising autoencoders
KW - Ensemble
KW - Class imbalance
UR - http://www.scopus.com/inward/record.url?scp=85072294152&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.112918
DO - 10.1016/j.eswa.2019.112918
M3 - Journal Article (refereed)
VL - 141
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
M1 - 112918
ER -