Will power-seeking AGIs harm human society?

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Many have argued, based on the Instrumental Convergence Thesis, that Artificial General Intelligences (AGIs) will exhibit power-seeking behavior. Such behavior, they warn, could harm human society and pose existential threats—namely, the risk of human extinction or the permanent collapse of civilization. These arguments often rely on an implicit and underexamined assumption: that AGIs will develop world models—internal representations of world dynamics—that resemble those of humans. We challenge this assumption. We argue that once the anthropomorphic assumption—that AGIs’ world models will mirror our own—is rejected, it becomes unclear whether AGIs would pursue the types of power commonly emphasized in the literature, or any familiar types of power at all. This analysis casts doubt on the strength of existing arguments linking the Instrumental Convergence Thesis to existential threats. Moreover, it reveals a deeper layer of uncertainty. AGIs with non-human world models may identify novel or unanticipated types of power that fall outside existing taxonomies, thereby posing underappreciated risks. We further argue that world model alignment—an issue largely overlooked in comparison with value alignment—should be recognized as a core dimension of AI alignment. We conclude by outlining several open questions to inform and guide future research.
Original languageEnglish
JournalAI and Society
Early online date21 Aug 2025
DOIs
Publication statusE-pub ahead of print - 21 Aug 2025

Bibliographical note

I would like to thank Myungjun Kim, Nikolaj Jang Lee Linding Pedersen, and Adrian Yee for their helpful comments and discussions on this paper. Special thanks to Simon Goldstein, Jiji Zhang, Adam Bradley, Daniel Pallies, James Fanciullo, and Jesse Hill for their discussions on various related topics. I would also like to thank the audiences at Fudan University and Yonsei University. Finally, I am grateful to the editor and the three anonymous reviewers for their insightful comments and suggestions, which significantly improved the manuscript.

Publisher Copyright:
© The Author(s) 2025.

Funding

Open Access Publishing Support Fund provided by Lingnan University.

Keywords

  • AGI risk
  • Instrumental convergence thesis
  • Existential catastrophe
  • AI alignment
  • Anthropomorphism
  • World model alignment

Fingerprint

Dive into the research topics of 'Will power-seeking AGIs harm human society?'. Together they form a unique fingerprint.

Cite this