Abstract
Local and non-local attention operations are two ubiquitous operations in the domain of speech enhancement (SE), and they are effective to generate more discriminative patterns from the noisy mixture. However, a noisy speech signal contains many fast-changing and dynamic acoustic features that are hard to precisely capture by using both attention operations indiscriminately. Besides, simply combining local and non-local attention operations is unable to avoid their demerits while keeping their merits in the SE tasks. To tackle these issues, we propose a cooperative attention based SE network (CASE-Net) as an inventive attempt to make a trade-off between local and non-local attention operations for generating more discriminative patterns from local and global speech regions. In addition, since the high computational cost issue in non-local attention, we propose a time–frequency (TF)-wise non-local attention model, in which the 2D non-local attention is divided into two 1D sub-attentions. Therefore, the time–frequency TF-wise non-local attention provides two parallel non-local sub-attentions to separately calculate the attention maps along both the time and frequency axis, as a consequence, the training process is facilitated. Experimental results show the 2 observations that (1) cooperative attention makes an effective trade-off between local and non-local attention operations, and the proposed CASE-Net achieves higher performance than recent models in terms of PESQ and STOI, (2) the proposed TF-wise non-local attention significantly improves the network performance while maintaining a lower computational complexity than the conventional non-local attention.
| Original language | English |
|---|---|
| Pages (from-to) | 31-39 |
| Number of pages | 9 |
| Journal | Speech Communication |
| Volume | 148 |
| Early online date | 26 Feb 2023 |
| DOIs | |
| Publication status | Published - Mar 2023 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2023 Elsevier B.V.
Funding
This paper is partly supported by the National Nature Science Foundation of China (No. 62071342 , No. 62171326 ) and the Fundamental Research Funds for the Central Universities, China (No. 2042022kf0001 ).
Keywords
- Cooperative attention
- Global feature
- Local attention
- Local feature
- Speech enhancement
- TF-wise non-local attention
Fingerprint
Dive into the research topics of 'CASE-Net: Integrating local and non-local attention operations for speech enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver