AN ADVERSARIAL AND DEEP HASHING-BASED HIERARCHICAL SUPERVISED CROSS-MODAL IMAGE AND TEXT RETRIEVAL ALGORITHM

Ruidong CHEN, Baohua QIANG*, Mingliang ZHOU, Shihao ZHANG, Hong ZHENG, Chenghua TANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

With the rapid development of robotics and sensor technology, vast amounts of valuable multimodal data are collected. It is extremely critical for a variety of robots performing automated tasks to find relevant multimodal information quickly and efficiently in large amounts of data. In this paper, we propose an adversarial and deep hashing-based hierarchical supervised cross-modal image and text retrieval algorithm to perform semantic analysis and association modelling on image and text by making full use of the rich semantic information of the label hierarchy. First, the modal adversarial block and the modal differentiation network both perform adversarial learning to keep different modalities with the same semantics closest to each other in a common subspace. Second, the intra-label layer similarity loss and inter-label layer correlation loss are used to fully exploit the intrinsic similarity existing in each label layer and the correlation existing between label layers. Finally, an objective function for different semantic data is redesigned to keep data with different semantics away from each other in a common subspace, thus avoiding interference of retrieval by data of different semantics. The experimental results on two cross-modal retrieval datasets with hierarchically supervised information show that the proposed method substantially enhances retrieval performance and consistently outperforms other state-of-the-art methods.

Original languageEnglish
Pages (from-to)77-86
Number of pages10
JournalInternational Journal of Robotics and Automation
Volume39
Issue number1
DOIs
Publication statusPublished - 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Acta Press. All rights reserved.

Keywords

  • adversarial network
  • Cross-modal image and text retrieval
  • deep hash algorithm
  • hierarchical supervision

Fingerprint

Dive into the research topics of 'AN ADVERSARIAL AND DEEP HASHING-BASED HIERARCHICAL SUPERVISED CROSS-MODAL IMAGE AND TEXT RETRIEVAL ALGORITHM'. Together they form a unique fingerprint.

Cite this