Text-to-Image Person Re-identification Based on Multimodal Graph Convolutional Network

Guang HAN, Min LIN, Ziyang LI, Haitao ZHAO, Sam KWONG

Research output: Journal PublicationsJournal Article (refereed)peer-review

1 Citation (Scopus)

Abstract

Text-to-image person re-identification (ReID) is a common subproblem in the field of person re-identification and image-text retrieval. Recent approaches generally follow the structure of a dual-stream network, extracting image and text features. There is no deep interaction between images and text in this approach, making it difficult for the network to learn a highly semantic feature representation. In addition, for both image data and text data, the feature extraction process is modeled in a regular way, such as using Transformer to extract sequence embeddings. However, this type of modeling disregards the inherent relationships among multimodal input embeddings. A more flexible approach to mining multimodal data, which uniformly treats the data as graphs, is proposed. In this way, the extraction and interaction of multimodal information are accomplished by means of messages passing between graph nodes. First, a unified multimodal feature extraction and fusion network is proposed based on the graph convolutional network, which enables the progression of multimodal information from ‘local’ to ‘global’. Second, an asymmetric multilevel alignment module, which focuses on more accurate ‘local’ information from a ‘global’ perspective, is proposed to progressively divide the multimodal information at each level. Last, a cross-modal representation matching strategy based on similarity distribution and mutual information is proposed to achieve cross-modal alignment. The proposed algorithm in this paper is simple and efficient, and the testing results on three public datasets (CUHK-PEDES, ICFG-PEDES and RSTPReID) show that it can achieve SOTA-level performance.
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalIEEE Transactions on Multimedia
Early online date19 Dec 2023
DOIs
Publication statusE-pub ahead of print - 19 Dec 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Convolutional neural networks
  • Cross-modal retrieval
  • Data mining
  • Feature extraction
  • graph convolutional network
  • Graph neural networks
  • image-text retrieval
  • person re-identification
  • person search
  • Semantics
  • Task analysis
  • Visualization

Fingerprint

Dive into the research topics of 'Text-to-Image Person Re-identification Based on Multimodal Graph Convolutional Network'. Together they form a unique fingerprint.

Cite this