大模型时代自动问答系统及评价体系综述

Translated title of the contribution: A Survey on Question Answering Systems and Evaluation in the Era of Large Models
  • 崔龙飞
  • , 王宗水
  • , 鲍盈旭
  • , 赵红

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

大模型时代,自动问答系统呈现出诸多新的特征。通过文献阅读和梳理,对自动问答系统特征和评测体系进行总结与归纳,从问答模型推理训练的训练数据、预训练框架、模型后处理、模型高效微调等阶段,对比大模型发展初期 “追求数据和参数规模”的训练方法和如今“注重数据和模型效率”之间的差异,系统分析基于大模型的自动问答系统新的特征。总结当前各种类型的自动问答大模型评测体系,并详细梳理自动化评价体系HELM (Holistic Evaluation of Language Model) 在自动问答任务上的数据集、评价指标和量化计算方法。未来基于大模型的自动问答系统研究将会围绕多模态融合、高安全性、高可解释性、低资源消耗、结合大模型和自动化的综合评价体系这几个方面进一步拓展与深化。

In the era of large models (LMs), question answering (QA) systems exhibit new characteristics. This paper reviews QA system features and evaluation methods. From the stages of training data for question-answering model inference training, pre-training frameworks, model post-processing, and efficient model fine-tuning, it contrasts the early “pursuit of data and parameter scale” training methods with the current “emphasis on data and model efficiency,” and systematically analyzes the new characteristics of large-model–based automatic question-answering systems. The study summarizes existing QA evaluation methods, reviewing HELM (Holistic Evaluation of Language Models) regarding datasets, metrics, and quantitative assessment for QA tasks. Future research on LLM-based QA systems will focus on multimodal fusion, security, interpretability, efficiency, and integrated automated evaluation.
Translated title of the contributionA Survey on Question Answering Systems and Evaluation in the Era of Large Models
Original languageChinese (Simplified)
Journal计算机工程与应用
DOIs
Publication statusE-pub ahead of print - 28 Sept 2025

Funding

国家自然科学基金(71972175),新疆维吾尔自治区重点研发任务专 (2024B03026),北京市教委优秀青年人才培育计划 (BPHR202203237),中国博士后基金面上项目(2025M770691),高等教育研究专题 (2023GXJK570)。

Keywords

  • 大模型
  • 自动问答系统
  • 特征
  • HELM 评价体系
  • Large Models
  • Question Answering Systems
  • Features
  • HELM Evaluation Framework

Fingerprint

Dive into the research topics of 'A Survey on Question Answering Systems and Evaluation in the Era of Large Models'. Together they form a unique fingerprint.

Cite this