Abstract
Existing end-to-end neural codecs have made great progress in preserving audio quality. Despite their success, they still face challenges in achieving accurate and efficient quantization. Specifically, these codecs often overlook which features have a greater impact on perceptual audio quality during quantization, leading to a quantization error distribution that fails to reflect the actual importance of latent features. They are also sensitive to unusual data points (outliers) because they use Mean Squared Error (MSE) to measure quantization errors, which can increase quantization noise or spectral artifacts. To address these limitations, we propose AW-CEQCodec which integrates an Attention Weighting (AW) module and a Conditional Entropydriven Quantization (CEQ) loss. The AW enhances key regions of latent features before quantization, enabling more accurate quantizing critical features and reducing their quantization errors. After quantization, it restores global details from dequantized features, improving overall reconstruction. Moreover, the CEQ minimizes the uncertainty between latent and quantized features, effectively reflecting the distortion introduced by the quantization module. Experimental results on the CodecSuperb-STL dataset demonstrate that our method consistently outperforms baseline approaches, achieving superior audio quality at bitrates as low as 0.5 kbps, confirming its effectiveness in minimizing distortion and preserving perceptual quality. The reconstruction audio samples can be find at https://huazhi1024.github.io/first-page.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025: Proceedings |
| Editors | Bhaskar D RAO, Isabel TRANCOSO, Gaurav SHARMA, Neelesh B. MEHTA |
| Publisher | IEEE |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Conference
| Conference | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 |
|---|---|
| Country/Territory | India |
| City | Hyderabad |
| Period | 6/04/25 → 11/04/25 |
Bibliographical note
Publisher Copyright:© 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Funding
This work is supported by the National Nature Science Foundation of China (No. 62471343, No. 62071342, No.62171326), the Special Fund of Hubei Luojia Laboratory (No. 220100019).
Keywords
- attention weighting
- audio codec
- conditional entropy
- quantization distortion
Fingerprint
Dive into the research topics of 'Attention Weighting and Conditional Entropy-driven Quantization Loss for Neural Audio Codecs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver