Attention Weighting and Conditional Entropy-driven Quantization Loss for Neural Audio Codecs

  • Huiyu ZHANG
  • , Weiping TU*
  • , Yuhong YANG
  • , Xinmeng XU
  • , Yanzhen REN
  • , Rong ZHU
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Existing end-to-end neural codecs have made great progress in preserving audio quality. Despite their success, they still face challenges in achieving accurate and efficient quantization. Specifically, these codecs often overlook which features have a greater impact on perceptual audio quality during quantization, leading to a quantization error distribution that fails to reflect the actual importance of latent features. They are also sensitive to unusual data points (outliers) because they use Mean Squared Error (MSE) to measure quantization errors, which can increase quantization noise or spectral artifacts. To address these limitations, we propose AW-CEQCodec which integrates an Attention Weighting (AW) module and a Conditional Entropydriven Quantization (CEQ) loss. The AW enhances key regions of latent features before quantization, enabling more accurate quantizing critical features and reducing their quantization errors. After quantization, it restores global details from dequantized features, improving overall reconstruction. Moreover, the CEQ minimizes the uncertainty between latent and quantized features, effectively reflecting the distortion introduced by the quantization module. Experimental results on the CodecSuperb-STL dataset demonstrate that our method consistently outperforms baseline approaches, achieving superior audio quality at bitrates as low as 0.5 kbps, confirming its effectiveness in minimizing distortion and preserving perceptual quality. The reconstruction audio samples can be find at https://huazhi1024.github.io/first-page.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025: Proceedings
EditorsBhaskar D RAO, Isabel TRANCOSO, Gaurav SHARMA, Neelesh B. MEHTA
PublisherIEEE
Number of pages5
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Bibliographical note

Publisher Copyright:
© 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.

Funding

This work is supported by the National Nature Science Foundation of China (No. 62471343, No. 62071342, No.62171326), the Special Fund of Hubei Luojia Laboratory (No. 220100019).

Keywords

  • attention weighting
  • audio codec
  • conditional entropy
  • quantization distortion

Fingerprint

Dive into the research topics of 'Attention Weighting and Conditional Entropy-driven Quantization Loss for Neural Audio Codecs'. Together they form a unique fingerprint.

Cite this