Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain

Man Leung WONG, Kruy SENG, Pak Kan WONG

Research output: Journal PublicationsJournal Article (refereed)Researchpeer-review

Abstract

Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.
Original languageEnglish
Article number112918
JournalExpert Systems with Applications
Volume141
Early online date2 Sep 2019
DOIs
Publication statusE-pub ahead of print - 2 Sep 2019

Fingerprint

Costs
Industry
Marketing
Classifiers
Deep neural networks
Feature extraction
Experiments

Bibliographical note

This research is supported by LEO Dr David P. Chan Institute of Data Science.

Keywords

  • Cost-sensitive
  • Stacked denoising autoencoders
  • Ensemble
  • Class imbalance

Cite this

@article{13052fa9e6bf408e8029bcc6d9609085,
title = "Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain",
abstract = "Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.",
keywords = "Cost-sensitive, Stacked denoising autoencoders, Ensemble, Class imbalance",
author = "WONG, {Man Leung} and Kruy SENG and WONG, {Pak Kan}",
note = "This research is supported by LEO Dr David P. Chan Institute of Data Science.",
year = "2019",
month = "9",
day = "2",
doi = "10.1016/j.eswa.2019.112918",
language = "English",
volume = "141",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Ltd",

}

Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. / WONG, Man Leung; SENG, Kruy; WONG, Pak Kan.

In: Expert Systems with Applications, Vol. 141, 112918, 01.03.2020.

Research output: Journal PublicationsJournal Article (refereed)Researchpeer-review

TY - JOUR

T1 - Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain

AU - WONG, Man Leung

AU - SENG, Kruy

AU - WONG, Pak Kan

N1 - This research is supported by LEO Dr David P. Chan Institute of Data Science.

PY - 2019/9/2

Y1 - 2019/9/2

N2 - Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.

AB - Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.

KW - Cost-sensitive

KW - Stacked denoising autoencoders

KW - Ensemble

KW - Class imbalance

UR - http://www.scopus.com/inward/record.url?scp=85072294152&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.112918

DO - 10.1016/j.eswa.2019.112918

M3 - Journal Article (refereed)

VL - 141

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

M1 - 112918

ER -