Skip to main navigation Skip to search Skip to main content

GeWu: A Culturally-Grounded Chinese Benchmark for Multi-Stage Social Bias Evaluation in Large Language Models

  • Yi LIN
  • , Ziyi ZHOU
  • , Jiashi GAO
  • , Xinwei GUO
  • , Jiaxin ZHANG
  • , Haiyan WU
  • , Xin YAO
  • , Xuetao WEI*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

With the rapid deployment of Chinese large language models (LLMs), culturally-grounded bias evaluation remains understudied due to the dominance of English benchmarks and simplistic Chinese scenarios. To address this, we propose GeWu, a comprehensive benchmark featuring a culturally-aware dataset of 60,192 questions spanning 14 social groups with fine-grained Chinese contexts, significantly exceeding existing resources in breadth and depth. Our two-stage evaluation first quantifies bias via multiple-choice questions using a novel probability-based scoring mechanism to sensitively capture bias tendencies, distilling high-bias scenarios into GeWu-1K. This refined subset then enables multi-turn dialogue evaluations for in-depth analysis under realistic conditions. Experiments reveal that GeWu effectively exposes social biases in state-of-the-art Chinese LLMs, with 13.93% of scenarios eliciting universal bias across all models. This highlights persistent challenges and provides actionable insights for bias mitigation in Chinese contexts.

Original languageEnglish
Title of host publicationProceedings of the 40th AAAI Conference on Artificial Intelligence
EditorsSven KOENIG, Chad JENKINS, Matthew E. TAYLOR
PublisherAssociation for the Advancement of Artificial Intelligence
Pages32033-32041
Number of pages9
ISBN (Print)9781577359067
DOIs
Publication statusPublished - 14 Mar 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAssociation for the Advancement of Artificial Intelligence
Number38
Volume40
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference40th AAAI Conference on Artificial Intelligence, AAAI 2026
Country/TerritorySingapore
CitySingapore
Period20/01/2627/01/26

Bibliographical note

Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Funding

This work was supported in part by Major Program of Guangdong Province under Grant 2021QN02X166, and in part by the National Natural Science Foundation of China (Project No. 72031003).

Fingerprint

Dive into the research topics of 'GeWu: A Culturally-Grounded Chinese Benchmark for Multi-Stage Social Bias Evaluation in Large Language Models'. Together they form a unique fingerprint.

Cite this