FIRING-Net: A filtered feature recycling network for speech enhancement

  • Xinmeng XU
  • , Jizhen LI
  • , Yiqun ZHANG
  • , Yong LUO
  • , Yuhong YANG
  • , Weiping TU*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Current deep neural networks for speech enhancement (SE) aim to minimize the distance between the output signal and the clean target by filtering out noise features from input features. However, when noise and speech components are highly similar, SE models struggle to learn effective discrimination patterns. To address this challenge, we propose a Filter-Recycle-Interguide framework termed FIlter-Recycle-INterGuide NETwork (FIRING-Net) for SE, which filters the input features to extract target features and recycles the filtered-out features as non-target features. These two feature sets then guide each other to refine the features, leading to the aggregation of speech information within the target features and noise information within the non-target features. The proposed FIRING-Net mainly consists of a Local Module (LM) and a Global Module (GM). The LM uses outputs of the speech extraction network as target features and the residual between input and output as non-target features. The GM leverages the energy distribution of the self-attention map to extract target and non-target features guided by the highest and lowest energy regions. Both LM and GM include interaction modules to leverage the two feature sets in an inter-guided manner for collecting speech from non-target features and filtering out noise from target features. Experiments confirm the effectiveness of the Filter-Recycle-Interguide framework. Additionally, FIRING-Net achieves a good balance between SE performance and computational efficiency, outperforming other comparable models across various signal-to-noise ratio levels and noise environments.
Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages82043-82069
Number of pages27
ISBN (Electronic)9798331320850
Publication statusPublished - 2025
Externally publishedYes
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore EXPO, Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025
https://iclr.cc/

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25
Internet address

Bibliographical note

Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.

Funding

This work is supported by the National Natural Science Foundation of China (No.62471343, No. 62171326, No. 62071342).

Fingerprint

Dive into the research topics of 'FIRING-Net: A filtered feature recycling network for speech enhancement'. Together they form a unique fingerprint.

Cite this