SGSimEval : A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems

  • Beichen GUO
  • , Zhiyuan WEN*
  • , Yu YANG*
  • , Peng GAO
  • , Ruosong YANG
  • , Jiaxing SHEN
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

The growing interest in automatic survey generation (ASG), a task that traditionally required considerable time and effort, has been spurred by recent advances in large language models (LLMs). With advancements in retrieval-augmented generation (RAG) and the rising popularity of multi-agent systems (MASs), synthesizing academic surveys using LLMs has become a viable approach, thereby elevating the need for robust evaluation methods in this domain. However, existing evaluation methods suffer from several limitations, including biased metrics, a lack of human preference, and an over-reliance on LLMs-as-judges. To address these challenges, we propose SGSimEval, a comprehensive benchmark for Survey Generation with Similarity-Enhanced Evaluation that evaluates automatic survey generation systems by integrating assessments of the outline, content, and references, and also combines LLM-based scoring with quantitative metrics to provide a multifaceted evaluation framework. In SGSimEval, we also introduce human preference metrics that emphasize both inherent quality and similarity to humans. Extensive experiments reveal that current ASG systems demonstrate human-comparable superiority in outline generation, while showing significant room for improvement in content and reference generation, and our evaluation metrics maintain strong consistency with human assessments.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 21st International Conference, ADMA 2025, Proceedings
EditorsMasatoshi YOSHIKAWA, Xiaofeng MENG, Yang CAO, Chuan XIAO, Weitong CHEN, Yanda WANG
PublisherSpringer Science and Business Media Deutschland GmbH
Pages393-407
Number of pages15
ISBN (Print)9789819534555
DOIs
Publication statusE-pub ahead of print - 17 Oct 2025
Event21st International Conference on Advanced Data Mining and Applications, ADMA 2025 - Kyoto, Japan
Duration: 22 Oct 202524 Oct 2025

Publication series

NameLecture Notes in Computer Science
Volume16198
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Advanced Data Mining and Applications, ADMA 2025
Country/TerritoryJapan
CityKyoto
Period22/10/2524/10/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.

Funding

This work was conducted at the Research Institute for Artificial Intelligence of Things (RIAIoT) and supported by PolyU Internal Research Fund (No.BDZ3) and PolyU External Research Fund (No. ZDH5). Also, this work has benefited from the financial support of the EdUHK project under Grant No. RG 67/2024-2025R and Lingnan University (SDS24A5).

Keywords

  • Automatic Survey Generation
  • Evaluation Benchmark
  • Large Language Models
  • Semantic Similarity

Fingerprint

Dive into the research topics of 'SGSimEval : A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems'. Together they form a unique fingerprint.

Cite this