Abstract
Current deep neural networks for speech enhancement (SE) aim to minimize the distance between the output signal and the clean target by filtering out noise features from input features. However, when noise and speech components are highly similar, SE models struggle to learn effective discrimination patterns. To address this challenge, we propose a Filter-Recycle-Interguide framework termed FIlter-Recycle-INterGuide NETwork (FIRING-Net) for SE, which filters the input features to extract target features and recycles the filtered-out features as non-target features. These two feature sets then guide each other to refine the features, leading to the aggregation of speech information within the target features and noise information within the non-target features. The proposed FIRING-Net mainly consists of a Local Module (LM) and a Global Module (GM). The LM uses outputs of the speech extraction network as target features and the residual between input and output as non-target features. The GM leverages the energy distribution of the self-attention map to extract target and non-target features guided by the highest and lowest energy regions. Both LM and GM include interaction modules to leverage the two feature sets in an inter-guided manner for collecting speech from non-target features and filtering out noise from target features. Experiments confirm the effectiveness of the Filter-Recycle-Interguide framework. Additionally, FIRING-Net achieves a good balance between SE performance and computational efficiency, outperforming other comparable models across various signal-to-noise ratio levels and noise environments.
| Original language | English |
|---|---|
| Title of host publication | 13th International Conference on Learning Representations, ICLR 2025 |
| Publisher | International Conference on Learning Representations, ICLR |
| Pages | 82043-82069 |
| Number of pages | 27 |
| ISBN (Electronic) | 9798331320850 |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 13th International Conference on Learning Representations, ICLR 2025 - Singapore EXPO, Singapore, Singapore Duration: 24 Apr 2025 → 28 Apr 2025 https://iclr.cc/ |
Conference
| Conference | 13th International Conference on Learning Representations, ICLR 2025 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 24/04/25 → 28/04/25 |
| Internet address |
Bibliographical note
Publisher Copyright:© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
Funding
This work is supported by the National Natural Science Foundation of China (No.62471343, No. 62171326, No. 62071342).
Fingerprint
Dive into the research topics of 'FIRING-Net: A filtered feature recycling network for speech enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver