Financial fraud detection by using grammar-based multiobjective genetic programming with ensemble learning

  • Haibing LI

Student thesis: MPhil Thesis (Lingnan)


Financial fraud is a criminal act, which violates the law, rules or policy to gain unauthorized financial benefit. As an increasingly serious problem, it has attracted a lot of concerns. The major consequences are loss of billions of dollars each year, investor confidence and corporate reputation. Therefore, a study area called Financial Fraud Detection (FFD) is obligatory, in order to prevent the destructive results caused by financial fraud. In general, traditional modeling approaches are applied and based on pre-defined hypothesis testing of causes and effects for FFD problems. In addition, the evaluation criteria are often based on variable significance level or Goodness-of-fit only.

FFD has many common features like other data mining problems. It has accumulated vast amounts of data records of different forms (e.g. financial statements or annual reports) over a period of time. It is very difficult to observe the interesting information just by relying on traditional statistical methods. However, data mining techniques can be used to extract implicit, previously unknown and potentially useful patterns, rules or relations from massive data repositories. Such discovered patterns are appropriate to executive leadership, stakeholders and related regulatory agencies to reduce or avoid the losses.

As real-life problems, it is not sufficient for FFD to consider only a single criterion (e.g. Goodness-of-fit or accuracy). Instead, FFD can also seek multiple objectives (e.g. accuracy versus interestingness). It is not easy to consider multiple objectives at the same time unless applying combination methods (e.g. linear combination) by assigning different weights to present the importance for each criterion by using data mining techniques with a single evaluation criterion. For example, accuracy is more important than interestingness with weights of 0.9:0.1. But it is still difficult to decide the appropriate or exact values for weights. There-fore, multi-objective data-mining techniques are required to tackle FFD problems.

In this study, FFD is targeted, and comprehensively evaluated by a number of methods. The proposed method is based on Grammar-Based Genetic Programming (GBGP), which has been proven to be a powerful data mining technique to generate compact and straightforward results. The major contributions are three improvements of GBGP for FFD problems. First, multi-criteria are considered by integrating the concept of multi-objectives into GBGP. Second, minority prediction is applied to demonstrate the class prediction with unmatched rows in their rules. Lastly, a new meta-heuristic approach is introduced for ensemble learning in order to help users to select patterns from a pool of models to facilitate final decision-making. The experimental results showed the effectiveness of the new approach in four FFD problems including two real-life problems. The major implications and significances of the study can concretely generalize for three points. First, it suggests a new ensemble learning technique with GBGP. Second, it demonstrates the usability of classification rules generated by the proposed method. Third, it provides an efficient multi-objective method for solving FFD problems.
Date of Award2015
Original languageEnglish
Awarding Institution
  • Department of Computing and Decision Sciences
SupervisorMan Leung WONG (Supervisor)

Cite this