Resource Pages

Wednesday, April 23, 2025

Values in the wild: Discovering and analyzing values in real-world language model interactions

Huang, S., Durmus, E. et al. (n.d.).

Abstract

AI assistants can impart value judgments that shape people’s decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up,
privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by
context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like “moral nihilism”. While some values appear consistently across contexts (e.g. “transparency”), many are more specialized and context-dependent,
reflecting the diversity of human interlocutors and their varied contexts. For example, “harm prevention” emerges when Claude resists users, “historical accuracy” when responding to queries about controversial events, “healthy boundaries” when asked for relationship advice, and “human agency” in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.


Here are some thoughts:

For psychologists, this research is highly relevant. First, it sheds light on how AI can shape human cognition, particularly in terms of how people interpret advice, support, or information framed through value-laden language. As individuals increasingly interact with AI systems in therapeutic, educational, or everyday contexts, psychologists must understand how these systems can influence moral reasoning, decision-making, and emotional well-being. Second, the study emphasizes the context-dependent nature of value expression in AI, which opens up opportunities for research into how humans respond to AI cues and how trust or rapport might be developed (or undermined) through these interactions. Third, this work highlights ethical concerns: ensuring that AI systems do not inadvertently promote harmful values is an area where psychologists—especially those involved in ethics, social behavior, or therapeutic practice—can offer critical guidance. Finally, the study’s methodological approach to extracting and classifying values may offer psychologists a model for analyzing human communication patterns, enriching both theoretical and applied psychological research.

In short, Anthropic’s research provides psychologists with an important lens on the emerging dynamics between human values and machine behavior. It highlights both the promise and responsibility of ensuring AI systems promote human dignity, safety, and psychological well-being.