Projects per year
Abstract
Knowledge-based visual question answering (KB-VQA) requires reasoning about the visual grounding relations between the images and questions by incorporating external knowledge. Existing works typically retrieve knowledge from knowledge graphs by leveraging global multimodal representations of image-text pairs for graph convolution, which neglect contextual clues at hop granularity, resulting in suboptimal spreading and leveraging of contextual information. To this end, we propose a multi-hop graph reasoning network (MGRN) for KB-VQA, which consists of a knowledge graph constructor (KGC) module, a semantic-instructed graph reasoning (SGR) module, and an answering module. MGRN exploits multimodal semantics from given images and questions as instructions for graph reasoning to obtain the knowledge representation from either the scene graph and knowledge base. Specifically, KGC fuses the scene graph with triplets from ConceptNet and Comet to construct a contextual knowledge graph for retrieving knowledge representation. Furthermore, SGR conducts multi-hop graph reasoning to select top-
knowledge items for answering by passing and filtering interplay messages on contextual knowledge graphs under the guidance of multimodal semantic representation. Extensive experiments conducted on two public datasets show the effectiveness and outperformance of our method.
knowledge items for answering by passing and filtering interplay messages on contextual knowledge graphs under the guidance of multimodal semantic representation. Extensive experiments conducted on two public datasets show the effectiveness and outperformance of our method.
Original language | English |
---|---|
Journal | ACM Transactions on Intelligent Systems and Technology |
DOIs | |
Publication status | Accepted/In press - 21 Feb 2025 |
Funding
This work is supported by the National Key Research and Development Program of China (No. 2022YFB3305500), the National Natural Science Foundation of China (No.62273089), the Guangdong Basic and Applied Basic Research Foundation (No.2024A1515010237), the Hong Kong Research Grants Council under the General Research Fund (project no. PolyU 15200023), the Faculty Research Grants (SDS24A8) and the Direct Grant (DR25E8) of Lingnan University, Hong Kong.
Fingerprint
Dive into the research topics of 'A Multi-hop Graph Reasoning Network for Knowledge-based VQA'. Together they form a unique fingerprint.Projects
- 2 Active
-
Automatic Weight Learning at Data-level and Task-level for Multitask Learning with the Application for Implicit Sentiment Analysis
XIE, H. (PI)
1/01/25 → 31/12/26
Project: Grant Research
-
Pretraining Language Model for Financial News Analysis
XIE, H. (PI)
1/01/25 → 31/12/26
Project: Grant Research