Stable matching-based two-way selection in multi-label active learning with imbalanced data

Shuyue CHEN, Ran WANG*, Jian LU*, Xizhao WANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

8 Citations (Scopus)

Abstract

Multi-label active learning (MLAL) reduces the cost of manual annotation for multi-label problems by selecting high-quality unlabeled data. Existing MLAL methods usually perform a one-way selection by considering the examples’ informativeness, representativeness, or diversity. These methods acknowledge only the importance of the examples to the labels, but not vice versa. Due to the imbalanced nature of multi-label data, the selected dataset might also be highly imbalanced, causing negative effects on the learning performance. In this paper, we treat the selection of example-label pairs in MLAL as a two-way matching problem instead of a one-way selection problem. First, the label's preference for example, defined as the informativeness of the example regarding the label, as well as the example's preference for the label, which is defined as the probability of the example belonging to a positive class, are both considered. Then, a simple and effective stable matching (STM) model is adopted to realize the two-way selection. In addition, to provide reasonable candidates for the STM model, the roulette algorithm is utilized to allocate the annotation number for sub-classifiers. Comprehensive experiments demonstrate the competitiveness of the proposed approach and its effectiveness in selecting a relatively balanced dataset.

Original languageEnglish
Pages (from-to)281-299
Number of pages19
JournalInformation Sciences
Volume610
Early online date3 Aug 2022
DOIs
Publication statusPublished - Sept 2022
Externally publishedYes

Keywords

  • Active learning
  • Example-label pairs
  • Imbalanced data
  • Multi-label
  • Stable matching

Fingerprint

Dive into the research topics of 'Stable matching-based two-way selection in multi-label active learning with imbalanced data'. Together they form a unique fingerprint.

Cite this