Reversing the logic of generative AI alignment

Research output: Journal PublicationsJournal Article (refereed)peer-review

1 Citation (Scopus)

Abstract

The alignment of artificial intelligence (AI) systems with societal values and the public interest is a critical challenge in the field of AI ethics and governance. Traditional approaches, such as Reinforcement Learning with Human Feedback (RLHF) and Constitutional AI, often rely on pre-defined high-level ethical principles. This article critiques these conventional alignment frameworks through the philosophical perspectives of pragmatism and public interest theory, arguing against their rigidity and disconnect with practical impacts. It proposes an alternative alignment strategy that reverses the traditional logic, focusing on empirical evidence and the real-world effects of AI systems. By emphasizing practical outcomes and continuous adaptation, this pragmatic approach aims to ensure that AI technologies are developed according to the principles that are derived from the observable impacts produced by technology applications.

Original languageEnglish
Article numbere30
Number of pages15
JournalData and Policy
Volume7
Early online date10 Mar 2025
DOIs
Publication statusPublished - 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s), 2025. Published by Cambridge University Press.

Funding

No funding was received to conduct this study.

Keywords

  • AI alignment
  • constitutional AI
  • pragmatism
  • public interest
  • reinforcement learning with human feedback

Fingerprint

Dive into the research topics of 'Reversing the logic of generative AI alignment'. Together they form a unique fingerprint.

Cite this