Abdulnour, R. E., Gin, B., & Boscardin, C. K. (2025).
New England Journal of Medicine, 393(8), 786–797.
Abstract
Many learners are more facile with the use of large language models in medicine than their supervisors are. The authors provide an approach to clinical supervision that can mitigate the perils and amplify the promise of AI.
The article is paywalled.
Here is how it opens:
Human–computer interactions have been occurring for decades, but recent technological developments in medical artificial intelligence (AI) have resulted in more effective and potentially more dangerous
interactions. Although the hype around AI resonates with previous technological revolutions, such as the development of the Internet and the electronic health record, the appearance of large language models (LLMs) seems different. LLMs can simulate knowledge generation and clinical reasoning with humanlike fluency, which gives them the appearance of agency and independent information processing. Therefore, AI has the capacity to fundamentally alter medical learning and practice. As in other professions, the use of AI in medical training could result in professionals who are highly efficient yet less capable of independent problem solving and critical evaluation than their pre-AI counterparts.
Here is a rather detailed summary:
This article provides a practical framework for supervising trainees who are using Artificial Intelligence (AI), specifically focusing on the risks to developing clinical reasoning skills. While the examples are medical, the core concepts of cognitive offloading, deskilling, and critical thinking are directly applicable to clinical psychology and psychotherapy supervision.
The Core Challenge: Balancing Efficiency with Skill Development
The authors argue that AI tools, particularly Large Language Models (LLMs), present a paradox. They can enhance learning through simulation and cognitive offloading of rote tasks, but they also pose significant risks when used to replace, rather than augment, complex clinical reasoning. The central concern is that over-reliance on AI for tasks like diagnosis, case formulation, or treatment planning can lead to:
- Deskilling: Loss of newly acquired clinical reasoning skills.
- Never-skilling: Failure to develop essential competencies in the first place.
- Mis-skilling: Reinforcement of incorrect or biased clinical behavior due to flawed AI output.
This is especially dangerous because AI operates as a "black box," generating persuasive but potentially biased or inaccurate responses without transparent reasoning.
The "Leap of Faith" and the Supervisor's Role
A key concept is the AI interaction: a moment when a clinician receives an AI-generated judgment that cannot be fully retraced, requiring a "leap of faith" to trust it. The supervisor's job is to teach trainees to recognize these moments and pause for critical evaluation, rather than passively accepting the output.
The supervisor-learner dynamic may be inverted, as trainees are often more adept with the technology. The article reframes this as a shared learning opportunity, where supervisors and learners co-explore AI's capabilities and limitations in a "community of practice."
The DEFT-AI Framework for Supervision
The authors propose a structured, stepwise approach called DEFT-AI (Diagnosis, Evidence, Feedback, Teaching, and AI Recommendation) to turn an AI interaction into an educational moment that builds critical thinking. Here is how it can be applied in a psychology context:
- Diagnosis, Discussion, and Discourse: The supervisor asks the trainee to verbalize their own clinical reasoning before revealing the AI's input. Questions include: "What is your formulation and differential? What prompts did you use with the AI? Did the AI's output change your thinking, and how?"
- Evidence: The supervisor probes the trainee’s ability to support their clinical reasoning with psychological theory, evidence-based practice, and knowledge of the patient’s unique context. Simultaneously, the supervisor probes AI literacy: "How do you think the AI reached this conclusion? What are the known biases or weaknesses of this tool for this specific clinical question?"
- Feedback: The supervisor guides the trainee in self-reflection on gaps in their clinical knowledge, potential biases, and their interaction with the AI tool.
- Teaching: The supervisor provides targeted teaching to address identified gaps, reinforcing foundational clinical reasoning and AI literacy.
- AI Engagement Recommendation: The supervisor makes a clear recommendation on the appropriate future use of AI for the trainee, ranging from supervised practice to independent use with self-monitoring.
Cyborg vs. Centaur: Two Styles of AI Use
The article identifies two distinct collaboration styles that supervisors should help trainees recognize and shift between:
- Centaur Strategy: A strategic division of labor. The human delegates specific, well-defined tasks to AI (e.g., drafting psychoeducational materials, summarizing session notes) but relies on their own clinical judgment for core tasks like diagnosis and treatment planning. This is the preferred strategy for high-risk tasks.
- Cyborg Strategy: A tight, iterative interweaving with AI for every step of a task (e.g., co-constructing a case formulation by prompting, correcting, and refining with an LLM). This is efficient for low-risk, creative, or well-defined tasks but carries a high risk of deskilling.
Adaptive AI practice is the ability to fluidly switch between centaur, cyborg, and AI-independent modes based on the complexity and risk of the clinical task at hand.
Promoting AI Literacy: The "Verify and Trust" Paradigm
Ultimately, the goal is to foster a "verify and trust" mindset over blind trust. Supervisors must teach two key skills:
- Critical Appraisal of AI Output: Trainees must independently acquire and appraise evidence (e.g., clinical guidelines, therapeutic literature) for a clinical question and compare their own conclusions to the AI's output before accepting it.
- Effective Prompting: Trainees need to learn how to craft specific, context-rich, and unbiased prompts. Techniques like asking the AI to "think out loud" (chain-of-thought prompting) can expose the AI's reasoning and facilitate critical assessment.
For psychologists and clinical supervisors, this framework offers a clear, theory-grounded method to proactively integrate AI into supervision while safeguarding the development of independent, adaptive, and critical clinical judgment in trainees.
