Ke, L., Tong, S., Cheng, P., & Peng, K. (2025).
Artificial Intelligence Review, 58(10).
Abstract
This review explores the frontiers of large language models (LLMs) in psychological applications. Psychology has undergone several theoretical changes, and the current use of artificial intelligence (AI) and machine learning, particularly LLMs, promises to open up new research directions. We aim to provide a detailed exploration of how LLMs are transforming psychological research. We discuss the impact of LLMs across various branches of psychology—including cognitive and behavioral, clinical and counseling, educational and developmental, and social and cultural psychology—highlighting their ability to model patterns, cognition, and behavior similar to those observed in humans. Furthermore, we explore the ability of such models to generate coherent, contextually relevant text, offering innovative tools for literature reviews, hypothesis generation, experimental designs, experimental subjects, and data analysis in psychology. We emphasize the importance of addressing technical and ethical challenges, including data privacy, the ethics of using LLMs in psychological research, and the need for a deeper understanding of these models’ limitations. Researchers should use LLMs responsibly in psychological studies, adhering to ethical standards and considering the potential consequences of deploying these technologies in sensitive areas. Overall, this review provides a comprehensive overview of the current state of LLMs in psychology, exploring the potential benefits and challenges. We hope it can serve as a call to action for researchers to responsibly leverage LLMs’ advantages while addressing the associated risks.
Here are some thoughts:
LLMs as assistants, not replacements. They help with emotion recognition, risk flagging, and prognosis—but tend to underestimate risk in sensitive cases. Use them to prompt, not conclude, your clinical judgment.
The empathy gap is shrinking, but uneven. GPT-4 outperformed most humans on emotional intelligence measures, and AI feedback boosted peer empathy by nearly 20%. However, this is pattern recognition, not genuine attunement—critical when nonverbal cues matter.
Cognitive biases affect LLMs too. They show anchoring, representativeness, and cultural biases favoring WEIRD populations. This can subtly disadvantage clients from non-Western or underrepresented backgrounds.
Domain-specific training helps. The ChatCounselor model, trained on real therapy conversations, outperformed general-purpose models. Off-the-shelf tools are poor substitutes for clinically-trained ones.
Research gains are real. LLMs aid literature synthesis, hypothesis generation, and drafting documentation. But outputs must be verified—errors and misattributions are easy to miss.
Ethical infrastructure lags. Privacy, informed consent, and diagnostic bias remain unresolved. Treat AI as an adjunct to professional judgment and be transparent with clients.
Bottom line: LLMs are useful but evolving faster than ethical and clinical frameworks. Engage thoughtfully—neither dismiss nor uncritically adopt them—to stay ahead as the landscape shifts.
