Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Tuesday, June 13, 2023

Using the Veil of Ignorance to align AI systems with principles of justice

Weidinger, L. McKee, K.R., et al. (2023).
PNAS, 120(18), e2213709120

Abstract

The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.

Significance

The growing integration of Artificial Intelligence (AI) into society raises a critical question: How can principles be fairly selected to govern these systems? Across five studies, with a total of 2,508 participants, we use the Veil of Ignorance to select principles to align AI systems. Compared to participants who know their position, participants behind the veil more frequently choose, and endorse upon reflection, principles for AI that prioritize the worst-off. This pattern is driven by increased consideration of fairness, rather than by political orientation or attitudes to risk. Our findings suggest that the Veil of Ignorance may be a suitable process for selecting principles to govern real-world applications of AI.

From the Discussion section

What do these findings tell us about the selection of principles for AI in the real world? First, the effects we observe suggest that—even though the VoI was initially proposed as a mechanism to identify principles of justice to govern society—it can be meaningfully applied to the selection of governance principles for AI. Previous studies applied the VoI to the state, such that our results provide an extension of prior findings to the domain of AI. Second, the VoI mechanism demonstrates many of the qualities that we want from a real-world alignment procedure: It is an impartial process that recruits fairness-based reasoning rather than self-serving preferences. It also leads to choices that people continue to endorse across different contexts even where they face a self-interested motivation to change their mind. This is both functionally valuable in that aligning AI to stable preferences requires less frequent updating as preferences change, and morally significant, insofar as we judge stable reflectively endorsed preferences to be more authoritative than their nonreflectively endorsed counterparts. Third, neither principle choice nor subsequent endorsement appear to be particularly affected by political affiliation—indicating that the VoI may be a mechanism to reach agreement even between people with different political beliefs. Lastly, these findings provide some guidance about what the content of principles for AI, selected from behind a VoI, may look like: When situated behind the VoI, the majority of participants instructed the AI assistant to help those who were least advantaged.