Lin, Z. (2025).
Advances in Methods and Practices in
Psychological Science, 8(3).
Abstract
Can artificial-intelligence (AI) systems, such as large language models (LLMs), replace human participants in behavioral and psychological research? Here, I critically evaluate the replacement perspective and identify six interpretive fallacies that undermine its validity. These fallacies are (a) equating token prediction with human intelligence, (b) treating LLMs as the average human, (c) interpreting alignment as explanation, (d) anthropomorphizing AI systems, (e) essentializing identities, and (f) substituting model data for human evidence. Each fallacy represents a potential misunderstanding about what LLMs are and what they can tell researchers about human cognition. In the analysis, I distinguish levels of similarity between LLMs and humans, particularly functional equivalence (outputs) versus mechanistic equivalence (processes), while highlighting both technical limitations (addressable through engineering) and conceptual limitations (arising from fundamental differences between statistical and biological intelligence). For each fallacy, specific safeguards are provided to guide responsible research practices. Ultimately, the analysis supports conceptualizing LLMs as pragmatic simulation tools—useful for role-play, rapid hypothesis testing, and computational modeling (provided their outputs are validated against human data)—rather than as replacements for human participants. This framework enables researchers to leverage language models productively while respecting the fundamental differences between machine intelligence and human thought.
Here are some thoughts:
This article critically examines the growing trend of using Large Language Models (LLMs) as direct substitutes for human participants in psychological and behavioral research. While acknowledging that LLMs can generate human-like text and sometimes mirror average human responses, Lin argues that this "replacement perspective" is fundamentally flawed and identifies six key interpretive fallacies that undermine its validity. These fallacies are: equating statistical token prediction with genuine human intelligence; assuming LLM outputs represent an "average human"; interpreting alignment between model and human outputs as evidence of shared cognitive mechanisms; anthropomorphizing AI systems by attributing human mental states to them; essentializing social identities by treating demographic labels as fixed and homogeneous; and directly substituting model-generated data for human evidence without validation. Lin contends that LLMs should be viewed not as replacements, but as pragmatic simulation tools useful for tasks like rapid hypothesis testing, role-playing, and computational modeling—provided their outputs are always validated against real human data. The article emphasizes the fundamental, often conceptual, differences between statistical machine intelligence and biologically grounded, embodied human cognition.