Abstract
Current deep learning-based speech enhancement methods aim to establish a mapping relationship between noisy and clean speech. However, since the training target is solely clean speech and lacks knowledge of noise, these networks struggle in regions where speech and noise features are similar, leading to either insufficient or excessive noise removal. Although some methods incorporate noise as an additional training target, the unpredictability of noise signals makes effective modeling challenging. In this paper, we propose a Spatial INformation AIded MOnaural Speech Enhancement (SINAI-MoSE), a monaural speech enhancement method that utilizes spatial information to assist in discriminating and modeling speech and noise features. Specifically, the encoder part of SINAI-MoSE adopts a progressive speech and noise feature extraction approach and establishes a mapping relationship between single-channel noisy speech and synthesized dual-channel noisy speech that is simulated via an ideal room impulse response. Additionally, the decoder reconstructs the speech features extracted by the encoder using a multi-sparsity Conformer network to handle speech details from local to global with high precision. Empirical studies underscore the effectiveness of spatial information in speech and noise feature discrimination. Consequently, SINAI-MoSE demonstrates significant advancements over recent monaural speech enhancement methods, excelling in speech quality and intelligibility.
| Original language | English |
|---|---|
| Article number | 126349 |
| Number of pages | 13 |
| Journal | Expert Systems with Applications |
| Volume | 269 |
| Early online date | 3 Jan 2025 |
| DOIs | |
| Publication status | Published - 15 Apr 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2024
Funding
This work is supported by the National Nature Science Foundation of China (No. 62471343, No. 62071342, No.62171326).
Keywords
- Back-projection
- Monaural speech enhancement
- Self-attention
- Simulated spatial information
Fingerprint
Dive into the research topics of 'Spatial information aided speech and noise feature discrimination for Monaural speech enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver