Prototype-Based Discriminative Feature Representation for Class-incremental Cross-modal Retrieval

Shaoquan ZHU, Yong FENG*, Mingliang ZHOU*, Baohua QIANG, Bin FANG, Ran WEI

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

1 Citation (Scopus)

Abstract

Cross-modal retrieval aims to retrieve the related items from various modalities with respect to a query from any type. The key challenge of cross-modal retrieval is to learn more discriminative representations between different category, as well as expand to an unseen class retrieval in the open world retrieval task. To tackle the above problem, in this paper, we propose a prototype learning-based discriminative feature learning (PLDFL) to learn more discriminative representations in a common space. First, we utilize a prototype learning algorithm to cluster these samples labeled with the same semantic class, by jointly taking into consideration the intra-class compactness and inter-class sparsity without discriminative treatments. Second, we use the weight-sharing strategy to model the correlations of cross-modal samples to narrow down the modality gap. Finally, we apply the prototype to achieve class-incremental learning to prove the robustness of our proposed approach. According to our experimental results, significant retrieval performance in terms of mAP can be achieved on average compared to several state-of-The-Art approaches.

Original languageEnglish
Article number2150018
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume35
Issue number5
Early online date26 Dec 2020
DOIs
Publication statusPublished - Apr 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021 World Scientific Publishing Company.

Keywords

  • Cross-modal retrieval
  • discriminative representation
  • inter-class sparsity
  • intra-class compactness
  • prototype learning

Fingerprint

Dive into the research topics of 'Prototype-Based Discriminative Feature Representation for Class-incremental Cross-modal Retrieval'. Together they form a unique fingerprint.

Cite this