Abstract
With the rapid deployment of Chinese large language models (LLMs), culturally-grounded bias evaluation remains understudied due to the dominance of English benchmarks and simplistic Chinese scenarios. To address this, we propose GeWu, a comprehensive benchmark featuring a culturally-aware dataset of 60,192 questions spanning 14 social groups with fine-grained Chinese contexts, significantly exceeding existing resources in breadth and depth. Our two-stage evaluation first quantifies bias via multiple-choice questions using a novel probability-based scoring mechanism to sensitively capture bias tendencies, distilling high-bias scenarios into GeWu-1K. This refined subset then enables multi-turn dialogue evaluations for in-depth analysis under realistic conditions. Experiments reveal that GeWu effectively exposes social biases in state-of-the-art Chinese LLMs, with 13.93% of scenarios eliciting universal bias across all models. This highlights persistent challenges and provides actionable insights for bias mitigation in Chinese contexts.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 40th AAAI Conference on Artificial Intelligence |
| Editors | Sven KOENIG, Chad JENKINS, Matthew E. TAYLOR |
| Publisher | Association for the Advancement of Artificial Intelligence |
| Pages | 32033-32041 |
| Number of pages | 9 |
| ISBN (Print) | 9781577359067 |
| DOIs | |
| Publication status | Published - 14 Mar 2026 |
| Event | 40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore Duration: 20 Jan 2026 → 27 Jan 2026 |
Publication series
| Name | Proceedings of the AAAI Conference on Artificial Intelligence |
|---|---|
| Publisher | Association for the Advancement of Artificial Intelligence |
| Number | 38 |
| Volume | 40 |
| ISSN (Print) | 2159-5399 |
| ISSN (Electronic) | 2374-3468 |
Conference
| Conference | 40th AAAI Conference on Artificial Intelligence, AAAI 2026 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 20/01/26 → 27/01/26 |
Bibliographical note
Publisher Copyright:© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Funding
This work was supported in part by Major Program of Guangdong Province under Grant 2021QN02X166, and in part by the National Natural Science Foundation of China (Project No. 72031003).
Fingerprint
Dive into the research topics of 'GeWu: A Culturally-Grounded Chinese Benchmark for Multi-Stage Social Bias Evaluation in Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver