PGD-N2L: A Parameter-Guided Disentanglement Approach for Normal-To-Lombard Speech Conversion

  • Hongyang CHEN
  • , Yuhong YANG
  • , Xinmeng XU
  • , Xingyu LIU
  • , Weiping TU
  • , Zhongyuan WANG
  • , Cedar LIN
  • , Xin ZHAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

The Normal-To-Lombard (N2L) speech conversion can effectively improve speech intelligibility in noisy communication scenarios and serve as a data augmentation tool for various speech-related algorithms. However, existing N2L methods did not aim to disentangle the Lombard effect from other speech attributes, leading to incomplete conversions. In this paper, we propose a Parameter-Guided Disentanglement approach for N2L speech conversion (PGD-N2L) which decomposes speech into linguistic content, speaker identity, and Lombard effect. To extract disentangled linguistic content, we propose a DeLomb-Based content encoder. To extract disentangled speaker identity and Lombard effect, we propose a style encoder that combines a fine-tuned speaker encoder and a learnable Lombard encoder to form a personalized style embedding. Furthermore, an En-Lomb-Based injection module is designed to accurately integrate the target Lombard effect and speaker identity into the linguistic content based on personalized style embedding, ensuring complete Lombard conversion. Experimental results demonstrate that our proposed method outperforms existing N2L models in speech intelligibility, acoustic similarity, and speech quality. Ablation studies confirm that the fine-tuned speaker encoder and the De-Lomb block effectively improve speech intelligibility and acoustic similarity, while the En-Lomb block enables the converted speech to more closely match the target Lombard speech.
Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo: Journey to the Center of Machine Imagination, ICME 2025: Conference Proceedings
PublisherIEEE
Number of pages6
ISBN (Electronic)9798331594954
ISBN (Print)9798331594961
DOIs
Publication statusPublished - 1 Jan 2025
Externally publishedYes
Event2025 IEEE International Conference on Multimedia and Expo (ICME) - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Conference

Conference2025 IEEE International Conference on Multimedia and Expo (ICME)
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Funding

This research is funded in part by the National Natural Science Foundation of China (62171326, 62471343) and Guangdong OPPO Mobile Telecommunications Corp..

Keywords

  • disentanglement
  • Lombard effect
  • speech intelligibility enhancement

Fingerprint

Dive into the research topics of 'PGD-N2L: A Parameter-Guided Disentanglement Approach for Normal-To-Lombard Speech Conversion'. Together they form a unique fingerprint.

Cite this