Ethics and Psychology: Value Alignment

Showing posts with label Value Alignment. Show all posts

Wednesday, April 23, 2025

Values in the wild: Discovering and analyzing values in real-world language model interactions

Huang, S., Durmus, E. et al. (n.d.).

Abstract

AI assistants can impart value judgments that shape people’s decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up,

privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by

context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like “moral nihilism”. While some values appear consistently across contexts (e.g. “transparency”), many are more specialized and context-dependent,

reflecting the diversity of human interlocutors and their varied contexts. For example, “harm prevention” emerges when Claude resists users, “historical accuracy” when responding to queries about controversial events, “healthy boundaries” when asked for relationship advice, and “human agency” in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.

Link to paper.

Here are some thoughts:

For psychologists, this research is highly relevant. First, it sheds light on how AI can shape human cognition, particularly in terms of how people interpret advice, support, or information framed through value-laden language. As individuals increasingly interact with AI systems in therapeutic, educational, or everyday contexts, psychologists must understand how these systems can influence moral reasoning, decision-making, and emotional well-being. Second, the study emphasizes the context-dependent nature of value expression in AI, which opens up opportunities for research into how humans respond to AI cues and how trust or rapport might be developed (or undermined) through these interactions. Third, this work highlights ethical concerns: ensuring that AI systems do not inadvertently promote harmful values is an area where psychologists—especially those involved in ethics, social behavior, or therapeutic practice—can offer critical guidance. Finally, the study’s methodological approach to extracting and classifying values may offer psychologists a model for analyzing human communication patterns, enriching both theoretical and applied psychological research.

In short, Anthropic’s research provides psychologists with an important lens on the emerging dynamics between human values and machine behavior. It highlights both the promise and responsibility of ensuring AI systems promote human dignity, safety, and psychological well-being.

Wednesday, February 19, 2025

The Moral Psychology of Artificial Intelligence

Bonnefon, J., Rahwan, I., & Shariff, A. (2023).

Annual Review of Psychology, 75(1), 653–675.

https://doi.org/10.1146/annurev-psych-030123-113559

Abstract

Moral psychology was shaped around three categories of agents and patients: humans, other animals, and supernatural beings. Rapid progress in artificial intelligence has introduced a fourth category for our moral psychology to deal with: intelligent machines. Machines can perform as moral agents, making decisions that affect the outcomes of human patients or solving moral dilemmas without human supervision. Machines can be perceived as moral patients, whose outcomes can be affected by human decisions, with important consequences for human–machine cooperation. Machines can be moral proxies that human agents and patients send as their delegates to moral interactions or use as a disguise in these interactions. Here we review the experimental literature on machines as moral agents, moral patients, and moral proxies, with a focus on recent findings and the open questions that they suggest.

Here are some thoughts:

This article delves into the evolving moral landscape shaped by artificial intelligence (AI). As AI technology progresses rapidly, it introduces a new category for moral consideration: intelligent machines.

Machines as moral agents are capable of making decisions that have significant moral implications. This includes scenarios where AI systems can inadvertently cause harm through errors, such as misdiagnosing a medical condition or misclassifying individuals in security contexts. The authors highlight that societal expectations for these machines are often unrealistically high; people tend to require AI systems to outperform human capabilities significantly while simultaneously overestimating human error rates. This disparity raises critical questions about how many mistakes are acceptable from machines in life-and-death situations and how these errors are distributed among different demographic groups.

In their role as moral patients, machines become subjects of human moral behavior. This perspective invites exploration into how humans interact with AI—whether cooperatively or competitively—and the potential biases that may arise in these interactions. For instance, there is a growing concern about algorithmic bias, where certain demographic groups may be unfairly treated by AI systems due to flawed programming or data sets.

Lastly, machines serve as moral proxies, acting as intermediaries in human interactions. This role allows individuals to delegate moral decision-making to machines or use them to mask unethical behavior. The implications of this are profound, as it raises ethical questions about accountability and the extent to which humans can offload their moral responsibilities onto AI.

Overall, the article underscores the urgent need for a deeper understanding of the psychological dimensions associated with AI's integration into society. As encounters between humans and intelligent machines become commonplace, addressing issues of trust, bias, and ethical alignment will be crucial in shaping a future where AI can be safely and effectively integrated into daily life.

Wednesday, October 25, 2023

The moral psychology of Artificial Intelligence

Bonnefon, J., Rahwan, I., & Shariff, A.

(2023, September 22).

https://doi.org/10.1146/annurev-psych-030123-113559

Abstract

Moral psychology was shaped around three categories of agents and patients: humans, other animals, and supernatural beings. Rapid progress in Artificial Intelligence has introduced a fourth category for our moral psychology to deal with: intelligent machines. Machines can perform as moral agents, making decisions that affect the outcomes of human patients, or solving moral dilemmas without human supervi- sion. Machines can be as perceived moral patients, whose outcomes can be affected by human decisions, with important consequences for human-machine cooperation. Machines can be moral proxies, that human agents and patients send as their delegates to a moral interaction, or use as a disguise in these interactions. Here we review the experimental literature on machines as moral agents, moral patients, and moral proxies, with a focus on recent findings and the open questions that they suggest.

Conclusion

We have not addressed every issue at the intersection of AI and moral psychology. Questions about how people perceive AI plagiarism, about how the presence of AI agents can reduce or enhance trust between groups of humans, about how sexbots will alter intimate human relations, are the subjects of active research programs. Many more yet unasked questions will only be provoked as new AI abilities develops. Given the pace of this change, any review paper will only be a snapshot. Nevertheless, the very recent and rapid emergence of AI-driven technology is colliding with moral intuitions forged by culture and evolution over the span of millennia. Grounding an imaginative speculation about the possibilities of AI with a thorough understanding of the structure of human moral psychology will help prepare for a world shared with, and complicated by, machines.

Friday, January 5, 2018

Implementation of Moral Uncertainty in Intelligent Machines

Kyle Bogosian
Minds and Machines
December 2017, Volume 27, Issue 4, pp 591–608

Abstract

The development of artificial intelligence will require systems of ethical decision making to be adapted for automatic computation. However, projects to implement moral reasoning in artificial moral agents so far have failed to satisfactorily address the widespread disagreement between competing approaches to moral philosophy. In this paper I argue that the proper response to this situation is to design machines to be fundamentally uncertain about morality. I describe a computational framework for doing so and show that it efficiently resolves common obstacles to the implementation of moral philosophy in intelligent machines.

Introduction

Advances in artificial intelligence have led to research into methods by which sufficiently intelligent systems, generally referred to as artificial moral agents (AMAs), can be guaranteed to follow ethically defensible behavior. Successful implementation of moral reasoning may be critical for managing the proliferation of autonomous vehicles, workers, weapons, and other systems as they increase in intelligence and complexity.

Approaches towards moral decisionmaking generally fall into two camps, “top-down” and “bottom-up” approaches (Allen et al 2005). Top-down morality is the explicit implementation of decision rules into artificial agents. Schemes for top-down decision-making that have been proposed for intelligent machines include Kantian deontology (Arkoudas et al 2005) and preference utilitarianism (Oesterheld 2016). Bottom-up morality avoids reference to specific moral theories by developing systems that can implicitly learn to distinguish between moral and immoral behaviors, such as cognitive architectures designed to mimic human intuitions (Bello and Bringsjord 2013). There are also hybrid approaches which merge insights from the two frameworks, such as one given by Wiltshire (2015).

The article is here.

Ethics and Psychology

Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Wednesday, April 23, 2025

Values in the wild: Discovering and analyzing values in real-world language model interactions

Wednesday, February 19, 2025

The Moral Psychology of Artificial Intelligence

Wednesday, October 25, 2023

The moral psychology of Artificial Intelligence

Friday, January 5, 2018

Implementation of Moral Uncertainty in Intelligent Machines

Get posts by email:

Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Wednesday, April 23, 2025

Values in the wild: Discovering and analyzing values in real-world language model interactions

Wednesday, February 19, 2025

The Moral Psychology of Artificial Intelligence

Wednesday, October 25, 2023

The moral psychology of Artificial Intelligence

Friday, January 5, 2018

Implementation of Moral Uncertainty in Intelligent Machines

Subscribe To