Abstract
The growing interest in automatic survey generation (ASG), a task that traditionally required considerable time and effort, has been spurred by recent advances in large language models (LLMs). With advancements in retrieval-augmented generation (RAG) and the rising popularity of multi-agent systems (MASs), synthesizing academic surveys using LLMs has become a viable approach, thereby elevating the need for robust evaluation methods in this domain. However, existing evaluation methods suffer from several limitations, including biased metrics, a lack of human preference, and an over-reliance on LLMs-as-judges. To address these challenges, we propose SGSimEval, a comprehensive benchmark for Survey Generation with Similarity-Enhanced Evaluation that evaluates automatic survey generation systems by integrating assessments of the outline, content, and references, and also combines LLM-based scoring with quantitative metrics to provide a multifaceted evaluation framework. In SGSimEval, we also introduce human preference metrics that emphasize both inherent quality and similarity to humans. Extensive experiments reveal that current ASG systems demonstrate human-comparable superiority in outline generation, while showing significant room for improvement in content and reference generation, and our evaluation metrics maintain strong consistency with human assessments.
| Original language | English |
|---|---|
| Title of host publication | Advanced Data Mining and Applications - 21st International Conference, ADMA 2025, Proceedings |
| Editors | Masatoshi YOSHIKAWA, Xiaofeng MENG, Yang CAO, Chuan XIAO, Weitong CHEN, Yanda WANG |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 393-407 |
| Number of pages | 15 |
| ISBN (Print) | 9789819534555 |
| DOIs | |
| Publication status | E-pub ahead of print - 17 Oct 2025 |
| Event | 21st International Conference on Advanced Data Mining and Applications, ADMA 2025 - Kyoto, Japan Duration: 22 Oct 2025 → 24 Oct 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 16198 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 21st International Conference on Advanced Data Mining and Applications, ADMA 2025 |
|---|---|
| Country/Territory | Japan |
| City | Kyoto |
| Period | 22/10/25 → 24/10/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
Funding
This work was conducted at the Research Institute for Artificial Intelligence of Things (RIAIoT) and supported by PolyU Internal Research Fund (No.BDZ3) and PolyU External Research Fund (No. ZDH5). Also, this work has benefited from the financial support of the EdUHK project under Grant No. RG 67/2024-2025R and Lingnan University (SDS24A5).
Keywords
- Automatic Survey Generation
- Evaluation Benchmark
- Large Language Models
- Semantic Similarity