Abstract
The Normal-To-Lombard (N2L) speech conversion can effectively improve speech intelligibility in noisy communication scenarios and serve as a data augmentation tool for various speech-related algorithms. However, existing N2L methods did not aim to disentangle the Lombard effect from other speech attributes, leading to incomplete conversions. In this paper, we propose a Parameter-Guided Disentanglement approach for N2L speech conversion (PGD-N2L) which decomposes speech into linguistic content, speaker identity, and Lombard effect. To extract disentangled linguistic content, we propose a DeLomb-Based content encoder. To extract disentangled speaker identity and Lombard effect, we propose a style encoder that combines a fine-tuned speaker encoder and a learnable Lombard encoder to form a personalized style embedding. Furthermore, an En-Lomb-Based injection module is designed to accurately integrate the target Lombard effect and speaker identity into the linguistic content based on personalized style embedding, ensuring complete Lombard conversion. Experimental results demonstrate that our proposed method outperforms existing N2L models in speech intelligibility, acoustic similarity, and speech quality. Ablation studies confirm that the fine-tuned speaker encoder and the De-Lomb block effectively improve speech intelligibility and acoustic similarity, while the En-Lomb block enables the converted speech to more closely match the target Lombard speech.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE International Conference on Multimedia and Expo: Journey to the Center of Machine Imagination, ICME 2025: Conference Proceedings |
| Publisher | IEEE |
| Number of pages | 6 |
| ISBN (Electronic) | 9798331594954 |
| ISBN (Print) | 9798331594961 |
| DOIs | |
| Publication status | Published - 1 Jan 2025 |
| Externally published | Yes |
| Event | 2025 IEEE International Conference on Multimedia and Expo (ICME) - Nantes, France Duration: 30 Jun 2025 → 4 Jul 2025 |
Conference
| Conference | 2025 IEEE International Conference on Multimedia and Expo (ICME) |
|---|---|
| Country/Territory | France |
| City | Nantes |
| Period | 30/06/25 → 4/07/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Funding
This research is funded in part by the National Natural Science Foundation of China (62171326, 62471343) and Guangdong OPPO Mobile Telecommunications Corp..
Keywords
- disentanglement
- Lombard effect
- speech intelligibility enhancement
Fingerprint
Dive into the research topics of 'PGD-N2L: A Parameter-Guided Disentanglement Approach for Normal-To-Lombard Speech Conversion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver