Projects per year
Abstract
Scientific papers are often organized into structured facets (e.g., Introduction, Methods), and modern research dissemination increasingly includes multimodal content such as presentation videos and audio. This shift creates a need for summarization systems that can effectively integrate both structured and multimodal information. In this paper, we introduce the Multimodal Faceted Graph Scientific Summarization model, a graph-based model for multimodal faceted summarization. At its core, is a Multimodal Faceted Graph that encodes fine-grained elements—text spans, visual segments, and audio snippets from both papers and presentations—as distinct node types. It constructs a cross-modal, multi-level graph through unsupervised alignment and latent content grouping, enabling coherent structural and semantic alignment. To enhance multimodal integration, we further incorporate a Heterogeneous Graph Refiner and a Heterogeneous Graph Condenser, which refine and distill salient information from the multimodal graph into summary-focused representations. The model is trained with a joint text-graph objective that promotes both summary quality and structural consistency. Experiments show that our model outperforms both uni- and multimodal baselines across automatic and human evaluations, demonstrating the effectiveness of graph-based cross-modal modeling for scientific summarization. The code for our model is available at: https://github.com/Kwanheiyu2001/MFG-SciSum.
| Original language | English |
|---|---|
| Article number | 122851 |
| Journal | Information Sciences |
| Volume | 729 |
| Early online date | 1 Nov 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 1 Nov 2025 |
Bibliographical note
Publisher Copyright:© 2025 Elsevier Inc.
Funding
The work is supported by the Hong Kong RGC ECS (LU23200223/130393), Shenzhen University-Lingnan University Joint Research Programme (SZU-LU004/2526) and Internal Grants of Lingnan University , Hong Kong ( SDS24A11/106103 ).
Keywords
- Heterogeneous graph condenser
- Heterogeneous graph refiner
- Scientific multimodal faceted summarization
Fingerprint
Dive into the research topics of 'MFG-SciSum : A multimodal faceted graph framework for scientific summarization'. Together they form a unique fingerprint.Projects
- 3 Active
-
A Multimodal Architecture for Emotion Analysis and Mental Health Support using Fine-Tuned Large Language Models
CHIU, H. W. B. (PI)
1/07/25 → 30/06/27
Project: Grant Research
-
Semantic Multimodal Search: Bridging Papers, Videos, and Text for Efficient Information Discovery (語義多模態搜索:橋接論文、視頻與文本的高效信信探索方法)
CHIU, H. W. B. (PI), JI, J. (CoPI) & NIE, J. (CoI)
1/07/25 → 30/06/26
Project: Grant Research
-
Incorporating Visual-Linguistic Features into Scientific Document Summarization (將視覺語言特徵納入科學文獻摘要)
CHIU, H. W. B. (PI)
Research Grants Council (Hong Kong, China)
1/01/24 → 30/06/26
Project: Grant Research