CASE-Net: Integrating local and non-local attention operations for speech enhancement

  • Xinmeng XU
  • , Weiping TU*
  • , Yuhong YANG
  • *Corresponding author for this work

Research output: Journal PublicationsReview articleOther Review

Abstract

Local and non-local attention operations are two ubiquitous operations in the domain of speech enhancement (SE), and they are effective to generate more discriminative patterns from the noisy mixture. However, a noisy speech signal contains many fast-changing and dynamic acoustic features that are hard to precisely capture by using both attention operations indiscriminately. Besides, simply combining local and non-local attention operations is unable to avoid their demerits while keeping their merits in the SE tasks. To tackle these issues, we propose a cooperative attention based SE network (CASE-Net) as an inventive attempt to make a trade-off between local and non-local attention operations for generating more discriminative patterns from local and global speech regions. In addition, since the high computational cost issue in non-local attention, we propose a time–frequency (TF)-wise non-local attention model, in which the 2D non-local attention is divided into two 1D sub-attentions. Therefore, the time–frequency TF-wise non-local attention provides two parallel non-local sub-attentions to separately calculate the attention maps along both the time and frequency axis, as a consequence, the training process is facilitated. Experimental results show the 2 observations that (1) cooperative attention makes an effective trade-off between local and non-local attention operations, and the proposed CASE-Net achieves higher performance than recent models in terms of PESQ and STOI, (2) the proposed TF-wise non-local attention significantly improves the network performance while maintaining a lower computational complexity than the conventional non-local attention.
Original languageEnglish
Pages (from-to)31-39
Number of pages9
JournalSpeech Communication
Volume148
Early online date26 Feb 2023
DOIs
Publication statusPublished - Mar 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 Elsevier B.V.

Funding

This paper is partly supported by the National Nature Science Foundation of China (No. 62071342 , No. 62171326 ) and the Fundamental Research Funds for the Central Universities, China (No. 2042022kf0001 ).

Keywords

  • Cooperative attention
  • Global feature
  • Local attention
  • Local feature
  • Speech enhancement
  • TF-wise non-local attention

Fingerprint

Dive into the research topics of 'CASE-Net: Integrating local and non-local attention operations for speech enhancement'. Together they form a unique fingerprint.

Cite this