CoFaCo: Controllable Generative Talking Face Video Coding

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Efficient talking face video coding and control are crucial in modern video communication, reshaping how individuals connect, collaborate, and interact. Coding seeks to reduce transmission costs, while control enables the realization of user-customizable facial expressions and head poses in the transmitted videos. However, the compression efficiency of the common paradigm of applying control algorithms before video coding is not satisfactory. In this paper, we propose an efficient, Controllable Generative Talking Face Video Coding (CoFaCo) framework, which seamlessly integrates control into the coding process. Specifically, CoFaCo projects talking face videos into ultra-compact and semantic feature representations that can be customized by users before compression. To enable independent controls of pose and expression, we design a set of sophisticated losses to accurately decouple the pose and expression direction codes. Once the decoupled direction codes and the semantic face representations are obtained, the pose and expression control modules can be effectively learned to generate decoupled, controlled pose and expression direction codes. The controlled direction codes are subsequently smoothed to enhance temporal consistency in the controlled video output by the generators. Experimental results demonstrate that CoFaCo achieves competitive compression efficiency in ultra-low bit rate video reconstruction and control tasks, providing valuable insights for advancing face video communication with diverse control capabilities.
Original languageEnglish
JournalIEEE Transactions on Image Processing
Early online date12 Jan 2026
DOIs
Publication statusE-pub ahead of print - 12 Jan 2026

Bibliographical note

Publisher Copyright:
© 1992-2012 IEEE.

Keywords

  • Generative neural network
  • customizable control
  • face representation
  • talking face video coding

Fingerprint

Dive into the research topics of 'CoFaCo: Controllable Generative Talking Face Video Coding'. Together they form a unique fingerprint.

Cite this