A Multi-hop Graph Reasoning Network for Knowledge-based VQA

Zihan HU, Jiuxiang YOU, Zhenguo YANG*, Xiaoping LI, Haoran XIE, Qing LI, Wenyin LIU

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Knowledge-based visual question answering (KB-VQA) requires reasoning about the visual grounding relations between the images and questions by incorporating external knowledge. Existing works typically retrieve knowledge from knowledge graphs by leveraging global multimodal representations of image-text pairs for graph convolution, which neglect contextual clues at hop granularity, resulting in suboptimal spreading and leveraging of contextual information. To this end, we propose a multi-hop graph reasoning network (MGRN) for KB-VQA, which consists of a knowledge graph constructor (KGC) module, a semantic-instructed graph reasoning (SGR) module, and an answering module. MGRN exploits multimodal semantics from given images and questions as instructions for graph reasoning to obtain the knowledge representation from either the scene graph and knowledge base. Specifically, KGC fuses the scene graph with triplets from ConceptNet and Comet to construct a contextual knowledge graph for retrieving knowledge representation. Furthermore, SGR conducts multi-hop graph reasoning to select top-
knowledge items for answering by passing and filtering interplay messages on contextual knowledge graphs under the guidance of multimodal semantic representation. Extensive experiments conducted on two public datasets show the effectiveness and outperformance of our method.
Original languageEnglish
JournalACM Transactions on Intelligent Systems and Technology
DOIs
Publication statusAccepted/In press - 21 Feb 2025

Funding

This work is supported by the National Key Research and Development Program of China (No. 2022YFB3305500), the National Natural Science Foundation of China (No.62273089), the Guangdong Basic and Applied Basic Research Foundation (No.2024A1515010237), the Hong Kong Research Grants Council under the General Research Fund (project no. PolyU 15200023), the Faculty Research Grants (SDS24A8) and the Direct Grant (DR25E8) of Lingnan University, Hong Kong.

Fingerprint

Dive into the research topics of 'A Multi-hop Graph Reasoning Network for Knowledge-based VQA'. Together they form a unique fingerprint.

Cite this