Abstract
Monaural speech enhancement (SE) is a versatile and cost-effective approach that leverages recordings from a single microphone. However, it falls short of multi-channel SE due to the absence of spatial cues. These cues, present in multi-channel recordings, aid in distinguishing speech from noise more effectively. To bridge this gap, we introduce a method for mapping monaural speech into a fixed simulation space. Here, single-channel recordings are transformed into a predefined binaural format, enhancing the differentiation between target speech and noise components. This is achieved through knowledge distillation, enabling the monaural SE model to learn simulated binaural speech features from a pre-trained binaural SE model. It is important to note that we use a single type of binaural room impulse response and the monaural input of the student to simulate binaural speech. This way, our approach bypasses the paradox of generating virtual spatial information from monaural speech, while still benefiting from the spatial cues of binaural speech. Rigorous experiments demonstrate the effectiveness of our proposed method, showcasing its superior performance compared to recent monaural SE techniques in terms of PESQ and STOI scores.
| Original language | English |
|---|---|
| Pages (from-to) | 386-390 |
| Number of pages | 5 |
| Journal | IEEE Signal Processing Letters |
| Volume | 31 |
| Early online date | 18 Jan 2024 |
| DOIs | |
| Publication status | Published - 2024 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 1994-2012 IEEE.
Keywords
- fixed binaural presentation
- fixed simulation space
- knowledge distillation
- Monaural speech enhancement
Fingerprint
Dive into the research topics of 'Improving Monaural Speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver