Recent Data Augmentation Techniques in Natural Language Processing: A Brief Survey

Lingling XU, Haoran XIE, Fu Lee WANG, Weiming WANG*

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Data augmentation has recently gained increasing interest in natural language processing (NLP) because of its excellent performance in low-resource settings, contrastive learning, and few-shot learning. Data augmentation is initially a strategy to increase the amount of data by employing semantically invariant transformations, such as back translation and synonym replacement, on the raw data. With the development of data augmentation, a variety of augmentation strategies are designed to produce samples with opposite labels to the original data or even samples with unseen categories. In this paper, we provide a comprehensive and thorough study of text data augmentation techniques. We first discuss various data augmentation methods and then classify them into three types: semanticinvariant augmentation, random augmentation, and generative augmentation. Subsequently, we highlight the main application scenarios and downstream tasks involving data augmentation. We also describe the challenges in developing text data augmentations and the work that can be further investigated in the future. To conclude, this paper aims to summarize data augmentation techniques in NLP and show how they work to further improve the performance of NLP tasks.
Original languageEnglish
Pages (from-to)29-37
Number of pages9
JournalThe IEEE Intelligent Informatics Bulletin
Volume22
Issue number1
Publication statusPublished - Dec 2022

Funding

The research has been supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), and Lam Woo Research Fund (LWP20019) and Direct Grant (DR23B2), Lingnan University, Hong Kong.

Keywords

  • Data Augmentation
  • Contrastive Learning
  • Low-resource Setting
  • Few-shot Learning

Fingerprint

Dive into the research topics of 'Recent Data Augmentation Techniques in Natural Language Processing: A Brief Survey'. Together they form a unique fingerprint.

Cite this