Resource Pages

Monday, July 14, 2025

Promises and pitfalls of large language models in psychiatric diagnosis and knowledge tasks

Bang, C.-B., Jung, Y.-C. et al. (2025).
The British Journal of Psychiatry,
226(4), 243–244.

Abstract:

This study evaluates the performance of five large language models (LLMs), including GPT-4, in psychiatric diagnosis and knowledge tasks using a zero-shot approach. Compared to 11 psychiatry residents, GPT-4 demonstrated superior accuracy in diagnostic (F1 score: 63.41% vs. 47.43%) and knowledge tasks (85.05% vs. 62.01%). However, GPT-4 exhibited higher comorbidity error rates (30.48% vs. 0.87%), suggesting limitations in contextual understanding. When residents received GPT-4 guidance, their performance improved significantly without increasing critical errors. The findings highlight the potential of LLMs as clinical aids but underscore the need for careful integration to preserve human expertise and mitigate risks like over-reliance. Future research should compare LLMs with board-certified psychiatrists and explore multifaceted diagnostic frameworks.

Here are some thoughts:

For psychologists, these findings underscore the importance of balancing AI-assisted efficiency with human judgment. While LLMs could serve as valuable training aids or supplemental tools, their limitations emphasize the irreplaceable role of psychologists in interpreting complex patient narratives, cultural factors, and individualized care. Additionally, the study raises ethical considerations about over-reliance on AI, urging psychologists to maintain rigorous critical thinking and therapeutic rapport. Ultimately, this research calls for a thoughtful, evidence-based approach to integrating AI into mental health practice—one that leverages technological advancements while preserving the human elements essential to effective psychological care.