Abstract
Existing deep learning-based speech enhancement methods only adopt clean speech as positive samples to guide the training of speech enhancement networks while negative samples, i.e., noisy speech, are unexploited. In this paper, we adopt contrastive regularization (CR) built upon contrastive learning to exploit both the information of noisy and clean speech as negative and positive samples, respectively. Particularly, CR minimizes the distance between clean and enhanced speech and maximizes the distance between noisy and enhanced speech in the representation space of the self-supervised learning model. However, the contrastive samples are non-consensual, as the negatives are usually represented distantly from the clean speech, leaving the solution space still under-constricted. To tackle this issue, we provide the negative samples assembled from (1) the noisy speech, and (2) the corresponding enhanced speech without using CR, and we customize a curriculum learning strategy to define the importance of these negative samples to balance the learning difficulty caused by different similarities between the embeddings of the positive and negative samples. Experiments show that our proposal improves SE performance effectively without introducing additional computation/parameters.
| Original language | English |
|---|---|
| Title of host publication | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024: Proceedings |
| Publisher | IEEE |
| Pages | 10486-10490 |
| Number of pages | 5 |
| ISBN (Electronic) | 9798350344851 |
| ISBN (Print) | 9798350344868 |
| DOIs | |
| Publication status | Published - 2024 |
| Externally published | Yes |
| Event | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| ISSN (Print) | 1520-6149 |
Conference
| Conference | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Seoul |
| Period | 14/04/24 → 19/04/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Funding
This work is supported by National Nature Science Foundation of China (No. 62071342, No.62171326), the Special Fund of Hubei Luojia Laboratory (No. 220100019), the Hubei Province Technological Innovation Major Project (No. 2021BAA034) and the Fundamental Research Funds for the Central Universities (No.2042023kf1033).
Keywords
- contrastive regularization
- curriculum learning strategy
- positive and negative samples
- self-supervised learning model
- Speech enhancement
Fingerprint
Dive into the research topics of 'Curricular Contrastive Regularization for Speech Enhancement with Self-Supervised Representations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver