Abstract
This paper introduces text-driven talking avatar generation, a task that uses text to instruct both the generation and animation of an avatar. One significant obstacle in this task is the absence of paired text and talking avatar data for model training, limiting data-driven methodologies. To this end, we present a zero-shot approach that adapts an existing 3D-aware image generation model, trained on a large-scale image dataset for high-quality avatar creation, to align with textual instructions and be animated to produce talking avatars, eliminating the need for paired text and talking avatar data. Our approach’s core lies in the seamless integration of a 3D-aware image generation model (i.e., EG3D), the explicit 3DMM model, and a newly developed self-supervised inpainting technique, to create and animate the avatar and generate a temporal consistent talking video. Thorough evaluations demonstrate the effectiveness of our proposed approach in generating realistic avatars based on textual descriptions and empowering avatars to express user-specified text. Notably, our approach is highly controllable and can generate rich expressions and head poses.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2024 : 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVIII |
Editors | Aleš LEONARDIS, Elisa RICCI, Stefan ROTH, Olga RUSSAKOVSKY, Torsten SATTLER, Gül VAROL |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 416-433 |
Number of pages | 18 |
ISBN (Print) | 9783031732225 |
DOIs | |
Publication status | Published - 2025 |
Externally published | Yes |
Event | 18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy Duration: 29 Sept 2024 → 4 Oct 2024 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 15146 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 18th European Conference on Computer Vision, ECCV 2024 |
---|---|
Country/Territory | Italy |
City | Milan |
Period | 29/09/24 → 4/10/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Funding
This work has been supported by Hong Kong Research Grant Council - Early Career Scheme (Grant No. 27209621), General Research Fund Scheme (Grant No. 17202422), Theme-based Research (Grant No. T45-701/22-R) and RGC Matching Fund Scheme (RMGS). Part of the described research work is conducted in the JC STEM Lab of Robotics for Soft Materials funded by The Hong Kong Jockey Club Charities Trust.
Keywords
- Talking Avatar
- Text
- Training Efficiency