Curricular Contrastive Regularization for Speech Enhancement with Self-Supervised Representations

  • Xinmeng XU
  • , Chang HAN
  • , Yiqun ZHANG
  • , Weiping TU*
  • , Yuhong YANG
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Existing deep learning-based speech enhancement methods only adopt clean speech as positive samples to guide the training of speech enhancement networks while negative samples, i.e., noisy speech, are unexploited. In this paper, we adopt contrastive regularization (CR) built upon contrastive learning to exploit both the information of noisy and clean speech as negative and positive samples, respectively. Particularly, CR minimizes the distance between clean and enhanced speech and maximizes the distance between noisy and enhanced speech in the representation space of the self-supervised learning model. However, the contrastive samples are non-consensual, as the negatives are usually represented distantly from the clean speech, leaving the solution space still under-constricted. To tackle this issue, we provide the negative samples assembled from (1) the noisy speech, and (2) the corresponding enhanced speech without using CR, and we customize a curriculum learning strategy to define the importance of these negative samples to balance the learning difficulty caused by different similarities between the embeddings of the positive and negative samples. Experiments show that our proposal improves SE performance effectively without introducing additional computation/parameters.
Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024: Proceedings
PublisherIEEE
Pages10486-10490
Number of pages5
ISBN (Electronic)9798350344851
ISBN (Print)9798350344868
DOIs
Publication statusPublished - 2024
Externally publishedYes
EventICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

ConferenceICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Funding

This work is supported by National Nature Science Foundation of China (No. 62071342, No.62171326), the Special Fund of Hubei Luojia Laboratory (No. 220100019), the Hubei Province Technological Innovation Major Project (No. 2021BAA034) and the Fundamental Research Funds for the Central Universities (No.2042023kf1033).

Keywords

  • contrastive regularization
  • curriculum learning strategy
  • positive and negative samples
  • self-supervised learning model
  • Speech enhancement

Fingerprint

Dive into the research topics of 'Curricular Contrastive Regularization for Speech Enhancement with Self-Supervised Representations'. Together they form a unique fingerprint.

Cite this