Surrogate-Assisted Hybrid-Model Estimation of Distribution Algorithm for Mixed-Variable Hyperparameters Optimization in Convolutional Neural Networks

Jian-Yu LI, Zhi-Hui ZHAN, Jin XU, Sam KWONG, Jun ZHANG

Research output: Journal PublicationsJournal Article (refereed)peer-review

32 Citations (Scopus)

Abstract

The performance of a convolutional neural network (CNN) heavily depends on its hyperparameters. However, finding a suitable hyperparameters configuration is difficult, challenging, and computationally expensive due to three issues, which are 1) the mixed-variable problem of different types of hyperparameters; 2) the large-scale search space of finding optimal hyperparameters; and 3) the expensive computational cost for evaluating candidate hyperparameters configuration. Therefore, this article focuses on these three issues and proposes a novel estimation of distribution algorithm (EDA) for efficient hyperparameters optimization, with three major contributions in the algorithm design. First, a hybrid-model EDA is proposed to efficiently deal with the mixed-variable difficulty. The proposed algorithm uses a mixed-variable encoding scheme to encode the mixed-variable hyperparameters and adopts an adaptive hybrid-model learning (AHL) strategy to efficiently optimize the mixed-variables. Second, an orthogonal initialization (OI) strategy is proposed to efficiently deal with the challenge of large-scale search space. Third, a surrogate-assisted multi-level evaluation (SME) method is proposed to reduce the expensive computational cost. Based on the above, the proposed algorithm is named surrogate-assisted hybrid-model EDA (SHEDA). For experimental studies, the proposed SHEDA is verified on widely used classification benchmark problems, and is compared with various state-of-the-art methods. Moreover, a case study on aortic dissection (AD) diagnosis is carried out to evaluate its performance. Experimental results show that the proposed SHEDA is very effective and efficient for hyperparameters optimization, which can find a satisfactory hyperparameters configuration for the CIFAR10, CIFAR100, and AD diagnosis with only 0.58, 0.97, and 1.18 GPU days, respectively.
Original languageEnglish
Pages (from-to)2338-2352
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume34
Issue number5
Early online date20 Sept 2021
DOIs
Publication statusPublished - May 2023
Externally publishedYes

Bibliographical note

Funding Information:
This work was supported in part by the National Key Research and Development Program of China under Grant 2019YFB2102102; in part by the Outstanding Youth Science Foundation under Grant 61822602; in part by the National Natural Science Foundations of China (NSFC) under Grant 62176094, Grant 61772207, and Grant 61873097; in part by the Key-Area Research and Development of Guangdong Province under Grant 2020B010166002; in part by Guangdong Natural Science Foundation Research Team under Grant 2018B030312003; in part by Hong Kong GRF-RGC General Research Fund under Grant 9042816 (CityU 11209819); and in part by the Project from Tencent.

Publisher Copyright:
© 2012 IEEE.

Keywords

  • Aortic dissection (AD) diagnosis
  • convolutional neural network (CNN)
  • deep learning
  • estimation of distribution algorithm (EDA)
  • evolutionary computation (EC)
  • hybrid model
  • hyperparameters optimization
  • mixed variable

Fingerprint

Dive into the research topics of 'Surrogate-Assisted Hybrid-Model Estimation of Distribution Algorithm for Mixed-Variable Hyperparameters Optimization in Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this