Indicator-Aware Talking Face Generation Based on Evolutionary Multi-objective Optimization

Ge GUO, Wenjing HONG, Chaoyong ZHOU, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review


Audio-driven talking face generation (ATFG) is typically multifaceted, requiring high-quality faces, lip movements synchronized with the audio, and plausible facial expressions. Previous works have mainly focused on minimizing a single loss function constructed on the basis of consideration of the multiple performance requirements for an ATFG model. However, as a proxy, the loss function does not always yield optimal performance when minimised. Moreover, it is unlikely that there is a single model that can be optimal in terms of all relevant quality indicators. In this paper, we formulate the training of an ATFG model as an indicator-aware multi-objective optimization problem and propose a novel approach, namely Evolutionary Multi-objective Indicator-aware audio-driven talking face Generation (EMIG), to solve this problem. EMIG explicitly uses the quality indicators in the design process of ATFG models and can be easily adapted to the preferences of users. Experimental studies demonstrate the potential advantages of evolutionary multi-objective optimization for solving the task of ATFG and the effectiveness of EMIG over five state-of-the-art ATFG methods. © 2022 IEEE.
Original languageEnglish
Title of host publicationProceedings of the 2022 IEEE Symposium Series on Computational Intelligence, SSCI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages7
ISBN (Print)9781665487689
Publication statusPublished - 4 Dec 2022
Externally publishedYes

Bibliographical note

This work was supported by the grant from the National Natural Science Foundation of China (Grant No. 62106098). Corresponding author: Wenjing Hong.


  • Audio-driven talking face generation
  • Evolution-ary multi-objective optimization
  • Indicator-aware


Dive into the research topics of 'Indicator-Aware Talking Face Generation Based on Evolutionary Multi-objective Optimization'. Together they form a unique fingerprint.

Cite this