MFG-SciSum : A multimodal faceted graph framework for scientific summarization

  • Wenhui YU
  • , Zusheng TAN
  • , Fan YANG
  • , Jing LI
  • , Shen GAO
  • , Wai LAM
  • , Sam KWONG
  • , Billy CHIU*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Scientific papers are often organized into structured facets (e.g., Introduction, Methods), and modern research dissemination increasingly includes multimodal content such as presentation videos and audio. This shift creates a need for summarization systems that can effectively integrate both structured and multimodal information. In this paper, we introduce the Multimodal Faceted Graph Scientific Summarization model, a graph-based model for multimodal faceted summarization. At its core, is a Multimodal Faceted Graph that encodes fine-grained elements—text spans, visual segments, and audio snippets from both papers and presentations—as distinct node types. It constructs a cross-modal, multi-level graph through unsupervised alignment and latent content grouping, enabling coherent structural and semantic alignment. To enhance multimodal integration, we further incorporate a Heterogeneous Graph Refiner and a Heterogeneous Graph Condenser, which refine and distill salient information from the multimodal graph into summary-focused representations. The model is trained with a joint text-graph objective that promotes both summary quality and structural consistency. Experiments show that our model outperforms both uni- and multimodal baselines across automatic and human evaluations, demonstrating the effectiveness of graph-based cross-modal modeling for scientific summarization. The code for our model is available at: https://github.com/Kwanheiyu2001/MFG-SciSum.

Original languageEnglish
Article number122851
JournalInformation Sciences
Volume729
Early online date1 Nov 2025
DOIs
Publication statusE-pub ahead of print - 1 Nov 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Inc.

Funding

The work is supported by the Hong Kong RGC ECS (LU23200223/130393), Shenzhen University-Lingnan University Joint Research Programme (SZU-LU004/2526) and Internal Grants of Lingnan University , Hong Kong ( SDS24A11/106103 ).

Keywords

  • Heterogeneous graph condenser
  • Heterogeneous graph refiner
  • Scientific multimodal faceted summarization

Fingerprint

Dive into the research topics of 'MFG-SciSum : A multimodal faceted graph framework for scientific summarization'. Together they form a unique fingerprint.

Cite this