Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models

  • Ziyi ZHOU
  • , Xinwei GUO
  • , Jiashi GAO
  • , Xiangyu ZHAO
  • , Shiyao ZHANG
  • , Xin YAO
  • , Xuetao WEI*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities, surpassing human experts in various benchmark tests and playing a vital role in various industry sectors. Despite their effectiveness, a notable drawback of LLMs is their inconsistent moral behavior, which raises ethical concerns. This work delves into symmetric moral consistency in large language models and demonstrates that modern LLMs lack sufficient consistency ability in moral scenarios. Our extensive investigation of twelve popular LLMs reveals that their assessed consistency scores are influenced by position bias and selection bias rather than their intrinsic abilities. We propose a new framework tSMC, which gauges the effects of these biases and effectively mitigates the bias impact based on the Kullback-Leibler divergence to pinpoint LLMs' mitigated Symmetric Moral Consistency. We find that the ability of LLMs to maintain consistency varies across different moral scenarios. Specifically, LLMs show more consistency in scenarios with clear moral answers compared to those where no choice is morally perfect. The average consistency score of 12 LLMs ranges from 60.7% in high-ambiguity moral scenarios to 84.8% in low-ambiguity moral scenarios.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 37 (NeurIPS 2024)
EditorsA. GLOBERSON, L. MACKEY, D. BELGRAVE, A. FAN, U. PAQUET, J. TOMCZAK, C. ZHANG
PublisherNeural Information Processing Systems Foundation
Number of pages24
Volume37
ISBN (Electronic)9798331314385
Publication statusPublished - Dec 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Publication series

NameAdvances in Neural Information Processing Systems
PublisherNeural information processing systems foundation
ISSN (Print)1049-5258

Conference

Conference38th Conference on Neural Information Processing Systems, NeurIPS 2024
Country/TerritoryCanada
CityVancouver
Period9/12/2415/12/24

Bibliographical note

Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.

Funding

This work was supported by Key Programs of Guangdong Province under Grant 2021QN02X166. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding parties.

Fingerprint

Dive into the research topics of 'Unveiling the Bias Impact on Symmetric Moral Consistency of Large Language Models'. Together they form a unique fingerprint.

Cite this