Skip to main navigation Skip to search Skip to main content

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement

  • Xinmeng XU
  • , Weiping TU*
  • , Yuhong YANG
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning (RL) with a developed difficulty-regulated reward that is related to network performance, model complexity, and “the difficulty of the SE task”. The results show that our method achieves comparable or superior performance to existing approaches. In particular, Selector-Enhancer is potentially effective for real-world denoising, where the number and types of noise are varies on a single noisy mixture.
Original languageEnglish
Title of host publicationProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
EditorsBrian WILLIAMS, Yiling CHEN, Jennifer NEVILLE
PublisherAAAI press
Pages13853-13860
Number of pages8
ISBN (Electronic)9781577358800
DOIs
Publication statusPublished - 26 Jun 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Funding

This work was supported in part by the National Nature Science Foundation of China (No. 62071342, No.62171326), the Special Fund of Hubei Luojia Laboratory (No. 220100019), the Hubei Province Technological Innovation Major Project (No. 2021BAA034) and the Fundamental Research Funds for the Central Universities (No. 2042022kf0001).

Fingerprint

Dive into the research topics of 'Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement'. Together they form a unique fingerprint.

Cite this