Abstract
Data augmentation has been widely employed in low-resource aspect-based sentiment classification (ABSC) tasks to alleviate the issue of data sparsity and enhance the performance of the model. Unlike previous data augmentation approaches that rely on back translation, synonym replacement, or generative language models such as T5, the generation power of large language models is explored rarely. Large language models like GPT-3.5-turbo are trained on extensive datasets and corpus to capture semantic and contextual relationships between words and sentences. To this end, we propose Masked Aspect Term Prediction (MATP), a novel data augmentation method that utilizes the world knowledge and powerful generative capacity of large language models to generate new aspect terms via word masking. By incorporating AI feedback from large language models, MATP increases the diversity and richness of aspect terms. Experimental results on the ABSC datasets with BERT as the backbone model show that the introduction of new augmented datasets leads to significant improvements over baseline models, validating the effectiveness of the proposed data augmentation strategy that combines AI feedback.
| Original language | English |
|---|---|
| Article number | 100136 |
| Number of pages | 8 |
| Journal | Natural Language Processing Journal |
| Volume | 10 |
| Early online date | 21 Feb 2025 |
| DOIs | |
| Publication status | Published - Mar 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2025 The Author(s)
Funding
The work described in this paper was supported by the Katie Shu Sui Pui Charitable Trust — Academic Publication Fellowship (Reference No.: KSPF/2023/06), Hong Kong Metropolitan University.
Keywords
- Aspect-based sentiment classification
- Data augmentation
- Masked aspect term prediction
- AI feedback