Abstract
Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. The proposed MHCA-CRN model includes a channel-wise encoding structure for preserving intra-channel features and a multi-head cross-attention mechanism for fully exploiting cross-channel features. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech distortion led by end-to-end DMSE module. Finally, a spectral gain function is adopted to further suppress the unnatural residual noise. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.
| Original language | English |
|---|---|
| Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022: Proceedings |
| Publisher | IEEE |
| Pages | 6492-6496 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781665405409 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - , Singapore Duration: 23 May 2022 → 27 May 2022 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| Volume | 2022-May |
| ISSN (Print) | 1520-6149 |
Conference
| Conference | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
|---|---|
| Country/Territory | Singapore |
| Period | 23/05/22 → 27/05/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE
Funding
The research work is supported by Shenzhen Science & Technology Fundamental Research Programs (NO: GXWD20201231165807007-20200814115301001 & JSGG20191129105421211).
Keywords
- channel-independent encoding
- dual-microphone speech enhancement
- multi-head cross-attention
- SNR estimator
- spatial cues extraction
Fingerprint
Dive into the research topics of 'Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver