Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming

Hai-Bing LI, Man Leung WONG

Research output: Journal PublicationsJournal Article (refereed)

Abstract

Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genetic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Oversampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.
Original languageEnglish
Pages (from-to)148-156
Number of pages9
JournalJournal of Computer and Communications
Volume2
Issue number4
DOIs
Publication statusPublished - Mar 2014

Fingerprint

Genetic programming
Data mining
Radial basis function networks
Bayesian networks
Logistics
Neural networks

Bibliographical note

This work is partially supported by the Lingnan University Direct Grant DR13C7.

Keywords

  • Knowledge Discovering
  • Rule induction
  • token competition
  • SMOTE
  • Corporate Securities Fraud Detection
  • Grammar-based genetic programming

Cite this

@article{bac6af5697c6493198c6efda3e3a27c2,
title = "Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming",
abstract = "Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genetic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Oversampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.",
keywords = "Knowledge Discovering, Rule induction, token competition, SMOTE, Corporate Securities Fraud Detection, Grammar-based genetic programming",
author = "Hai-Bing LI and WONG, {Man Leung}",
note = "This work is partially supported by the Lingnan University Direct Grant DR13C7.",
year = "2014",
month = "3",
doi = "10.4236/jcc.2014.24020",
language = "English",
volume = "2",
pages = "148--156",
journal = "Journal of Computer and Communications",
number = "4",

}

Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming. / LI, Hai-Bing; WONG, Man Leung.

In: Journal of Computer and Communications, Vol. 2, No. 4, 03.2014, p. 148-156.

Research output: Journal PublicationsJournal Article (refereed)

TY - JOUR

T1 - Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming

AU - LI, Hai-Bing

AU - WONG, Man Leung

N1 - This work is partially supported by the Lingnan University Direct Grant DR13C7.

PY - 2014/3

Y1 - 2014/3

N2 - Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genetic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Oversampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.

AB - Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genetic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Oversampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.

KW - Knowledge Discovering

KW - Rule induction

KW - token competition

KW - SMOTE

KW - Corporate Securities Fraud Detection

KW - Grammar-based genetic programming

U2 - 10.4236/jcc.2014.24020

DO - 10.4236/jcc.2014.24020

M3 - Journal Article (refereed)

VL - 2

SP - 148

EP - 156

JO - Journal of Computer and Communications

JF - Journal of Computer and Communications

IS - 4

ER -