A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification

Yu ZHOU, Nanjian YANG, Xingyue HUANG, Jaesung LEE, Sam KWONG

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalIEEE Transactions on Cybernetics
Early online date18 Mar 2024
DOIs
Publication statusE-pub ahead of print - 18 Mar 2024

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Class imbalance
  • feature selection (FS)
  • Genetic programming
  • genetic programming (GP)
  • high-dimensional data classification
  • multiobjective optimization (MOO)
  • Optimization
  • Sensors
  • Sociology
  • Statistics
  • Task analysis
  • Vectors

Fingerprint

Dive into the research topics of 'A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification'. Together they form a unique fingerprint.

Cite this