Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Friday, June 19, 2026

Educational Strategies for Clinical Supervision of Artificial Intelligence Use

Abdulnour, R. E., Gin, B., & Boscardin, C. K. (2025). 
New England Journal of Medicine, 393(8), 786–797.

Abstract

Many learners are more facile with the use of large language models in medicine than their supervisors are. The authors provide an approach to clinical supervision that can mitigate the perils and amplify the promise of AI.

The article is paywalled.

Here is how it opens:

Human–computer interactions have been occurring for decades, but recent technological developments in medical artificial intelligence (AI) have resulted in more effective and potentially more dangerous
interactions. Although the hype around AI resonates with previous technological revolutions, such as the development of the Internet and the electronic health record, the appearance of large language models (LLMs) seems different. LLMs can simulate knowledge generation and clinical reasoning with humanlike fluency, which gives them the appearance of agency and independent information processing. Therefore, AI has the capacity to fundamentally alter medical learning and practice. As in other professions, the use of AI in medical training could result in professionals who are highly efficient yet less capable of independent problem solving and critical evaluation than their pre-AI counterparts.

Here is a rather detailed summary:

This article provides a practical framework for supervising trainees who are using Artificial Intelligence (AI), specifically focusing on the risks to developing clinical reasoning skills. While the examples are medical, the core concepts of cognitive offloading, deskilling, and critical thinking are directly applicable to clinical psychology and psychotherapy supervision.

The Core Challenge: Balancing Efficiency with Skill Development

The authors argue that AI tools, particularly Large Language Models (LLMs), present a paradox. They can enhance learning through simulation and cognitive offloading of rote tasks, but they also pose significant risks when used to replace, rather than augment, complex clinical reasoning. The central concern is that over-reliance on AI for tasks like diagnosis, case formulation, or treatment planning can lead to:

  • Deskilling: Loss of newly acquired clinical reasoning skills.
  • Never-skilling: Failure to develop essential competencies in the first place.
  • Mis-skilling: Reinforcement of incorrect or biased clinical behavior due to flawed AI output.

This is especially dangerous because AI operates as a "black box," generating persuasive but potentially biased or inaccurate responses without transparent reasoning.

The "Leap of Faith" and the Supervisor's Role

A key concept is the AI interaction: a moment when a clinician receives an AI-generated judgment that cannot be fully retraced, requiring a "leap of faith" to trust it. The supervisor's job is to teach trainees to recognize these moments and pause for critical evaluation, rather than passively accepting the output.

The supervisor-learner dynamic may be inverted, as trainees are often more adept with the technology. The article reframes this as a shared learning opportunity, where supervisors and learners co-explore AI's capabilities and limitations in a "community of practice."

The DEFT-AI Framework for Supervision

The authors propose a structured, stepwise approach called DEFT-AI (Diagnosis, Evidence, Feedback, Teaching, and AI Recommendation) to turn an AI interaction into an educational moment that builds critical thinking. Here is how it can be applied in a psychology context:

  • Diagnosis, Discussion, and Discourse: The supervisor asks the trainee to verbalize their own clinical reasoning before revealing the AI's input. Questions include: "What is your formulation and differential? What prompts did you use with the AI? Did the AI's output change your thinking, and how?"
  • Evidence: The supervisor probes the trainee’s ability to support their clinical reasoning with psychological theory, evidence-based practice, and knowledge of the patient’s unique context. Simultaneously, the supervisor probes AI literacy: "How do you think the AI reached this conclusion? What are the known biases or weaknesses of this tool for this specific clinical question?"
  • Feedback: The supervisor guides the trainee in self-reflection on gaps in their clinical knowledge, potential biases, and their interaction with the AI tool.
  • Teaching: The supervisor provides targeted teaching to address identified gaps, reinforcing foundational clinical reasoning and AI literacy.
  • AI Engagement Recommendation: The supervisor makes a clear recommendation on the appropriate future use of AI for the trainee, ranging from supervised practice to independent use with self-monitoring.

Cyborg vs. Centaur: Two Styles of AI Use

The article identifies two distinct collaboration styles that supervisors should help trainees recognize and shift between:

  • Centaur Strategy: A strategic division of labor. The human delegates specific, well-defined tasks to AI (e.g., drafting psychoeducational materials, summarizing session notes) but relies on their own clinical judgment for core tasks like diagnosis and treatment planning. This is the preferred strategy for high-risk tasks.
  • Cyborg Strategy: A tight, iterative interweaving with AI for every step of a task (e.g., co-constructing a case formulation by prompting, correcting, and refining with an LLM). This is efficient for low-risk, creative, or well-defined tasks but carries a high risk of deskilling.

Adaptive AI practice is the ability to fluidly switch between centaur, cyborg, and AI-independent modes based on the complexity and risk of the clinical task at hand.

Promoting AI Literacy: The "Verify and Trust" Paradigm

Ultimately, the goal is to foster a "verify and trust" mindset over blind trust. Supervisors must teach two key skills:

  1. Critical Appraisal of AI Output: Trainees must independently acquire and appraise evidence (e.g., clinical guidelines, therapeutic literature) for a clinical question and compare their own conclusions to the AI's output before accepting it.
  2. Effective Prompting: Trainees need to learn how to craft specific, context-rich, and unbiased prompts. Techniques like asking the AI to "think out loud" (chain-of-thought prompting) can expose the AI's reasoning and facilitate critical assessment.

For psychologists and clinical supervisors, this framework offers a clear, theory-grounded method to proactively integrate AI into supervision while safeguarding the development of independent, adaptive, and critical clinical judgment in trainees.


Wednesday, June 17, 2026

Living intelligence toward human-level models (HLMs) via Organoid-AI integration

Bai, L., Wang, J., Lai, Y., & Su, J. (2025).
EngMedicine, 2(4), 100106.

Abstract

The convergence of brain organoids and artificial intelligence (AI) has driven the development of organoid intelligence (OI), a new paradigm for constructing human-level cognitive models. Brain organoids derived from human stem cells exhibit self-organizing neural networks with dynamic activity and plasticity, offering a biologically based alternative to conventional AI systems. The integration of living networks with computational frameworks enables the design of closed-loop systems that combine the adaptability of biological tissues with the scalability and interpretability of AI. This approach not only provides a novel model for studying human cognition but also opens new pathways for biologically inspired computing. The development of such hybrid systems requires interdisciplinary collaboration among stem cell biology, bioengineering, neuroscience, and machine learning. The long-term goal is to establish biohybrid platforms capable of learning, memory formation, and task-specific computation, thereby redefining our understanding of intelligence and enabling the next generation of neurotechnologies.

Highlights

• Organoid Intelligence (OI) combines brain organoids and AI.
• OI creates biologically embodied models for human-level cognition.
• Biohybrid platforms can learn, remember, and perform computations.
• OI requires interdisciplinary collaboration for development.

Here are some general thoughts:

We are witnessing the infancy of true synthetic biological intelligence. While current applications are constrained to pattern recognition and disease modeling, the long-term trajectory completely disrupts the binary view of technology as "artificial" and biology as "natural." It forces tech developers and ethicists alike to confront a reality where the next generation of advanced intelligence might not be coded, but grown.

Monday, June 15, 2026

New 3D device harnesses living brain cells for computing

Princeton University
Office of Engineering
Originally posted April 27, 2026

Princeton researchers have combined brain cells and advanced electronics into a 3D device that can be programmed to recognize patterns using computational techniques.

Past attempts at using brain cells to do computation have relied on 2D cultures grown in a petri dish or 3D clusters that are probed and monitored from outside. The Princeton device takes a different approach, working from the inside out.

Using advanced fabrication techniques, the team created a 3D mesh made of microscopic metal wires and electrodes supported by a thin epoxy coating. Because the coating is so thin, it has just the right amount of flexibility to interface with the soft neurons that grow around it. The team used the mesh as a scaffold to culture tens of thousands of neurons into a vast 3D network that can be used to do computation.



Here are some thoughts:

Princeton University researchers have developed an innovative 3D device that integrates roughly 70,000 living biological neurons with advanced electronics to perform computational tasks, such as recognizing spatial and temporal electrical pulse patterns. Published in Nature Electronics, the study details a novel "inside-out" approach where an ultra-thin, flexible epoxy-coated mesh of microscopic metal wires and electrodes serves as a scaffold for the soft brain cells to grow around, allowing scientists to record and stimulate electrical activity at an unprecedentedly fine scale. By tracking and manipulating these neural connections over a six-month period, the team successfully trained an algorithm to distinguish between different pattern inputs, demonstrating a crucial first step toward creating highly energy-efficient 3D biological neural networks that could eventually alleviate the immense power demands of modern AI while providing deeper insights into neuroscience and neurological diseases.

Friday, June 12, 2026

Benchmarking Large Language Models Against Psychiatry Residents Using Traditional Institutional Assessments

Sethi, M. I. S. et al. (2026).
Indian Journal of Psychological Medicine, 
02537176261435658.

Background:Artificial intelligence (AI) models demonstrate remarkable capabilities in healthcare applications, yet their performance compared to medical trainees in psychiatric education remains unexplored. This study evaluated the comparative performance of large language models (LLMs) against first-year psychiatry residents in standardized assessments at a premier Indian medical educational institute.

Methods:For this study, the already-scored answer sheets for Theory Papers I and II, as well as unmanned, non-interactive Objective Structured Clinical Examinations (OSCEs) with image-based tasks, from all 25 first-year psychiatry residents (March 2024 exam) were obtained from the examination section of the institute. The same question papers were then uploaded into three AI models (ChatGPT−3.5, Gemini Advanced, and Claude Sonnet). Four blinded faculty members evaluated the responses generated by the AI models. Final, the scores of the AI models and psychiatry residents were analyzed for comparison. Statistical analysis employed Kruskal–Wallis tests with post hoc Mann–Whitney U comparisons.

Results:AI models outperformed residents in theoretical assessments. In Paper I (theory), AI models achieved mean scores (standard deviation) of Claude Sonnet 67.88 (10.63), ChatGPT−3.5 70.38 (3.95), and Gemini Advanced 71.25 (3.86), compared to residents’ 58.0 (2.58). Paper II (theory) assessments showed even larger gaps, with AI models scoring Claude Sonnet 72.88 (3.77), ChatGPT−3.5 71.0 (3.56), and Gemini Advanced 69.63 (12.86), compared to residents’ 50.96 (2.49). OSCE performance patterns differed markedly. Paper I OSCEs showed equivalent performance: AI: 13.0; residents’: 13.16 (1.49), while Paper II OSCEs revealed variable results: Claude Sonnet excelled at 20.0 (1.41), but ChatGPT−3.5 underperformed at 15.0 (0.50), compared to residents at 16.6 (1.55). Inter-rater reliability coefficients remained excellent ( intraclass correlation coefficients [ICC]: 0.810–0.934).

Conclusions:While AI demonstrated superior theoretical knowledge, equivalent or variable practical skills performance reveals fundamental limitations in clinical reasoning and contextual understanding. These findings necessitate reconceptualizing psychiatric education to emphasize uniquely human competencies while leveraging AI’s capabilities for knowledge synthesis.

Here are some thoughts:

This study compared three large language models (LLMs) to first-year psychiatry residents using real institutional exams in India. The LLMs consistently outperformed residents on theoretical assessments (by 17–43%) but showed equivalent or inconsistent performance on practical OSCEs, revealing critical gaps in clinical reasoning and cultural contextualization. The authors conclude that psychiatric education should shift focus toward uniquely human skills like empathy and judgment, while using AI as a tool for knowledge synthesis.

Wednesday, June 10, 2026

Adversarial AI reveals mechanisms and treatments for disorders of consciousness

Toker, D. et al. (2026).
Nature Neuroscience, 29(4), 964–977.

Abstract

Understanding disorders of consciousness (DOC) remains one of the most challenging problems in neuroscience, hindered by the lack of experimental models for probing mechanisms or testing interventions. Here, to address this, we introduce a generative adversarial artificial intelligence (AI) framework that pits deep neural networks—trained to detect consciousness across more than 680,000 ten-second neuroelectrophysiology samples and validated on 565 patients, healthy volunteers and animals—against interpretable, machine learning-driven neural field models. This adversarial architecture produces biologically realistic simulations of both conscious and comatose brains that recapitulate empirical neurophysiological features across humans, monkeys, rats and bats. Without explicit programming, the AI model retrodicts known DOC responses to brain stimulation and generates testable predictions about the mechanisms of unconsciousness. Two such predictions are validated here: selective disruption of the basal ganglia indirect pathway, supported by diffusion magnetic resonance imaging in 51 patients with DOC, and increased cortical inhibitory-to-inhibitory synaptic coupling, supported by RNA sequencing of resected brain tissue from 6 human patients with coma and a rat stroke model. The model also identifies high-frequency stimulation of the subthalamic nucleus as a promising intervention for DOC, supported by electrophysiological data from human patients. This work introduces an AI framework for causal inference and therapeutic discovery in consciousness research, as well as in complex systems more broadly.

Here are some thoughts:

This work gives psychologists a more concrete, brain-circuit-level understanding of why coma and related states happen, and points to a specific, testable new treatment approach, moving the field beyond “we don’t know what’s happening inside” toward identifiable mechanisms that may one day guide rehabilitation and family education.

Monday, June 8, 2026

Automation bias: a systematic review of frequency, effect mediators, and mitigators

Goddard, K., Roudsari, A., & Wyatt, J. C. (2011).
Journal of the American Medical
Informatics Association, 19(1), 121–127.

Abstract

Automation bias (AB)—the tendency to over-rely on automation—has been studied in various academic fields. Clinical decision support systems (CDSS) aim to benefit the clinical decision-making process. Although most research shows overall improved performance with use, there is often a failure to recognize the new errors that CDSS can introduce. With a focus on healthcare, a systematic review of the literature from a variety of research fields has been carried out, assessing the frequency and severity of AB, the effect mediators, and interventions potentially mitigating this effect. This is discussed alongside automation-induced complacency, or insufficient monitoring of automation output. A mix of subject specific and freetext terms around the themes of automation, human–automation interaction, and task performance and error were used to search article databases. Of 13 821 retrieved papers, 74 met the inclusion criteria. User factors such as cognitive style, decision support systems (DSS), and task specific experience mediated AB, as did attitudinal driving factors such as trust and confidence. Environmental mediators included workload, task complexity, and time constraint, which pressurized cognitive resources. Mitigators of AB included implementation factors such as training and emphasizing user accountability, and DSS design factors such as the position of advice on the screen, updated confidence levels attached to DSS output, and the provision of information versus recommendation. By uncovering the mechanisms by which AB operates, this review aims to help optimize the clinical decision-making process for CDSS developers and healthcare practitioners.

Here are some thoughts:

This systematic review examines the frequency, mediators, and mitigators of automation bias, which is the tendency for users to over rely on automated decision support systems as a heuristic replacement for vigilant information seeking and processing. The authors reviewed 74 studies across healthcare, aviation, and human computer interaction fields and found that automation bias is a robust effect, with a meta analysis showing that erroneous clinical decision support system advice increased the risk of incorrect decisions by 26%. Key mediators include user factors such as cognitive style, trust, confidence, and task specific experience, as well as environmental factors like workload, task complexity, and time pressure that strain cognitive resources. Mitigators include increasing user accountability, providing training, updating confidence levels alongside advice, positioning advice less prominently on screen, and presenting information rather than direct recommendations. The review concludes that automation bias and the related concept of automation induced complacency represent distinct but overlapping attentional phenomena that can introduce new errors even when decision support systems improve overall performance, highlighting the need for careful design and implementation strategies.

Friday, June 5, 2026

Transforming clinical reasoning—the role of AI in supporting human cognitive limitations

Greengrass C. J. (2026).
Frontiers in digital health, 7, 1715440.

Abstract

Clinical reasoning is foundational to medical practice, requiring clinicians to synthesise complex information, recognise patterns, and apply causal reasoning to reach accurate diagnoses and guide patient management. However, human cognition is inherently limited by factors such as limitations in working memory capacity, constraints in cognitive load, a general reliance on heuristics; with an inherent vulnerability to biases including anchoring, availability bias, and premature closure. Cognitive fatigue and cognitive overload, particularly apparent in high-pressure environments, further compromise diagnostic accuracy and efficiency. Artificial intelligence (AI) presents a transformative opportunity to overcome these limitations by supplementing and supporting decision-making. With AI's advanced computational capabilities, these systems can analyse large datasets, detect subtle or atypical patterns, and provide accurate evidence-based diagnoses. Furthermore, by leveraging machine learning and probabilistic modelling, AI reduces dependence on incomplete heuristics and potentially mitigates cognitive biases. It also ensures consistent performance, unaffected by fatigue or information overload. These attributes likely make AI an invaluable tool for enhancing the accuracy and efficiency of diagnostic reasoning. Through a narrative review, this article examines the cognitive limitations inherent in diagnostic reasoning and considers how AI can be positioned as a collaborative partner in addressing them. Drawing on the concept of Mutual Theory of Mind, the author identifies a set of indicators that should inform the design of future frameworks for human–AI interaction in clinical decision-making. These highlight how AI could dynamically adapt to human reasoning states, reduce bias, and promote more transparent and adaptive diagnostic support in high-stakes clinical environments.

Here are some thoughts:

This article examines how artificial intelligence can support clinical diagnostic reasoning by compensating for inherent human cognitive limitations such as limited working memory capacity, cognitive load, reliance on heuristics, and susceptibility to biases like anchoring and premature closure. The author integrates cognitive psychology concepts including dual process theory (System 1 intuitive pattern recognition versus System 2 analytical reasoning), cognitive load theory, and Bayesian reasoning to analyze how AI systems can reduce cognitive burden, provide external schema repositories, offer transparent explainable outputs, and support metacognitive monitoring. While AI offers advantages in processing vast data streams, maintaining multiple hypotheses, and performing consistently without fatigue, the review acknowledges current limitations of large language models including poor probabilistic reasoning and potential for algorithmic or transferred bias. The article concludes that AI should function as a collaborative partner within a Mutual Theory of Mind framework, enhancing rather than replacing human judgment, provided that ethical standards and clinician training keep pace with technological development.

Wednesday, June 3, 2026

Using AI-Based Virtual simulated patients for training in psychopathological interviewing: Cross-Sectional Observational study.

García-Torres, D., et al. (2025).
JMIR Medical Education, 11, e78857.

Abstract
Background:
Virtual simulated patients (VSPs) powered by generative artificial intelligence (GAI) offer a promising tool for training clinical interviewing skills; yet, little is known about how different system- and user-level variables shape students’ perceptions of these interactions.

Objective:
We aim to study psychology students’ perceptions of GAI-driven VSPs and examine how demographic factors, system parameters, and interaction characteristics influence such perceptions.

Methods:
We conducted a total of 1832 recorded interactions involving 156 psychology students with 13 GAI-generated VSPs configured with varying temperature settings (0.1, 0.5, 0.9). For each student, we collected age and sex; for each interview, we recorded interview length (total number of question–answer turns), number of connectivity failures, the specific VSP consulted, and the model temperature. After every interview, students provided a 1-10 global rating and open-ended comments regarding strengths and areas for improvement. At the end of the training sequence, they also reported perceived improvement in diagnostic ability. Statistical analyses assessed the influence of different variables on global ratings: demographics, interaction-level data, and GAI temperature setting. Sentiment analysis was conducted to evaluate the VSPs’ clinical realism.

Results:
Statistical analysis showed that female students rated the tool significantly higher (mean rating 9.25/10) than male students (mean rating 8.94/10; Kruskal-Wallis test, H=8.7; P=.003). On the other side, no significant correlation was found between global rating and age (r=0.02, 95% CI –0.03 to 0.06; P=.42), interview length (r=0.04, 95% CI –0.2 to 0.10; P=.18), or frequency of participation (Kruskal-Wallis test, H=4.62; P=.20). A moderate negative correlation emerged between connectivity failures and ratings (r=–0.26, 95% CI –0.41 to –0.10; P=.002). Temperature settings significantly influenced ratings (Kruskal-Wallis test, H=6.93; P=.03; η²=0.02), with higher scores at temperature 0.9 compared with 0.1 (Dunn’s test, P=.04). Concerning learning outcomes, self-perceived improvement in diagnostic ability was reported by 94% (94/100) of students; however, final practical examination scores (mean 6.67, SD 1.42) did not differ significantly from those of the previous cohort without VSP training (mean 6.42, SD 1.56). Sentiment analysis indicated predominantly negative sentiment in GAI responses (median negativity 0.8903, IQR 0.306-0.961), consistent with clinical realism.

Conclusions:
GAI-driven VSPs were well-received by psychology students, with student gender and system-level variables (particularly temperature settings and connection stability) shaping user evaluations. Although participants perceived the training as beneficial for their diagnostic skills, objective examination performance did not significantly differ from the previous cohort. However, lack of randomization limits the generalization of the results obtained, and further experiments are required.

Here are some thoughts:

This study is important because it demonstrates a promising application of AI in clinical training, using generative AI-powered virtual simulated patients to help psychology students practice psychopathological interviewing in a safe, low-stakes environment. The platform was highly rated by students and 94% reported meaningful improvement in their ability to identify clinically relevant symptoms. Higher AI temperature settings, which produce more natural and varied responses, were associated with greater student satisfaction, while connectivity failures reduced ratings, underscoring the importance of technical reliability. Although students found VSP-based sessions more challenging than traditional paper cases, final exam scores were comparable between cohorts, suggesting the AI simulation provides a more realistic learning experience rather than a less effective one. For practicing psychologists and educators, this study offers early empirical support for integrating AI-driven patient simulation into clinical training, while highlighting the need for randomized studies and careful calibration of AI parameters before broad adoption.

Monday, June 1, 2026

The moon, the ghetto and artificial intelligence: Reducing systemic racism in computational algorithms.

Fountain, J. (2022).
Government Information Quarterly, 39(2), 101645.

Abstract

Computational algorithms and automated decision making systems that include them offer potential to improve public policy and organizations. But computational algorithms based on biased data encode those biases into algorithms, models and their outputs. Systemic racism is institutionalized bias with respect to race, ethnicity and related attributes. Such bias is located in data that encode the results and outputs of decisions that have been discriminatory, in procedures and processes that may intentionally or unintentionally disadvantage people based on race, and in policies that may discriminate by race. Computational algorithms may exacerbate systemic racism if they are not designed, developed, and used–that is, enacted–with attention to identifying and remedying bias specific to race. Advancing social equity in digital governance requires systematic, ongoing efforts to assure that automated decision making systems, and their enactment in complex public organizational arrangements, are free from bias.

Highlights

• Computational algorithms are powerful tools but may replicate biases.
• Biases, including systemic racism, in underlying data bias algorithms
• Automated decision making systems that discriminate harm people.
• Careful scrutiny of data, processes, variables and algorithms may reduce bias.

Friday, May 29, 2026

AI in psychoeducational assessment: a study of report generation.

Farmer, R. L., et al. (2025).
Online preprint. 

Abstract

Artificial intelligence (AI) is poised to reshape school psychology, with report writing as a primary area of impact. A national sample of school psychologists using AI at work (n = 100) reported on its role in documentation. Of those using AI for report writing (n = 45), most applied it to discrete tasks such as rewriting sections for clarity (69%) or generating recommendations (67%). Far fewer used AI for interpretive or diagnostic purposes, and none relied on it to generate entire reports. Nearly all users (94%) edited AI-generated content before use. On average, practitioners who used AI for report writing saved 6.3 hours per week (95% CI: 4.6–8.1), more than double the savings of those using AI for other tasks (≈3 hours). AI shows promise for reducing documentation burden and reclaiming time for direct services, though its use raises ethical concerns requiring further guidance and oversight.

Here are some thoughts:

This preprint is important to practicing psychologists because AI is already being widely adopted in school psychology (roughly half of practitioners report using it at work) and those using it specifically for report writing are saving an estimated six hours per week, meaningful relief in a profession burdened by high caseloads and heavy documentation demands. However, the study also reveals real risks: a subset of practitioners are entering student-identifying information into AI platforms in likely violation of FERPA, and some are using AI to assist with diagnostic and eligibility decisions that currently lack any empirical validation for AI use. The research draws a clear distinction between acceptable applications (drafting, editing, improving clarity) and problematic ones (interpreting test scores, rendering classifications), and while it is encouraging that 94% of users edit AI-generated content before use, the authors argue this oversight needs to be formalized rather than left to individual habit, making this study a timely and practical reference for any psychologist navigating AI in their work.

Wednesday, May 27, 2026

“Feasible but Fragile”: An inflection point for artificial intelligence in mental health care.

Clegg, K. (2025).
Journal of Medical Internet Research, 27, e89202.

On November 18, 2025, a congressional hearing was held in Washington, DC, by the US House Energy and Commerce’s Subcommittee on Oversight and Investigations, examining the risks and benefits of artificial intelligence (AI) chatbots.

Marlynn Wei, MD, JD; Jennifer King, PhD; and John Torous, MBI, MD (director of digital psychiatry at Beth Israel Deaconess Medical School and associate professor of psychiatry at Harvard Medical School) provided expert testimony at the congressional hearing. I sat down with Torous to discuss his reflections on the future of AI in mental health.

An Inflection Point

Following on the heels of several lawsuits and mounting concerns about the safety of commercially available AI chatbots and their widespread “off-label” use as psychological support, November’s congressional hearing was somewhat anomalous—in a good way.

“I actually am optimistic,” says Torous, “because we never saw a congressional oversight committee form in the early days when social media came out or when apps came out or when VR [virtual reality] came out. It’s exciting to see a body like Congress taking the time and attention to try to understand what the issue is.”

He remarks that it’s a different trajectory than we’ve seen over the past 25 years of digital health innovation, one that simultaneously signals that “we’re seeing the end of AI exceptionalism in mental health.” It suggests that regulators are taking the risks seriously and that AI—whether purpose-built or used de facto—will not be exempt from the same scrutiny applied to other clinical tools.

It’s also a potential inflection point from the otherwise rapid and underregulated growth and proliferation of AI tools for mental health, including many chatbots whose safety and efficacy remain to be definitively established. The shape of the trajectory now—whether these tools succeed or fail to materialize their potential for improving mental health care—depends on what we do from here.

The information is here.

Key Takeaways
  • The future of artificial intelligence (AI) tools in mental health care is at an inflection point; regulators are taking both the potential benefits and the risks of these tools seriously.
  • Whether these tools succeed or fail to meet their potential for improving mental health care depends on the extent to which stakeholders are able to successfully seize the moment and collaborate on transparent, high-quality research; establish and incentivize safety and efficacy; adopt patient-centric benchmarks; and think beyond traditional therapeutic models.

Monday, May 25, 2026

After Automation

Dan Shipper
CEO of Every
Published May 21, 2026

There is a paradox at the heart of AI.

At Every, we’ve automated everything we can. We use Codex and Claude Code across coding, writing, design, customer service, and more. We alpha-test all of the new models from OpenAI, Anthropic, and Google before they come out. We are riding the exponential boom in model intelligence and automation as far and as fast as possible.

And yet it seems like, for us, there’s more human work to do than ever. We are a team of almost 30 people, and we haven’t fired all of our employees in favor of agents. We haven’t ditched software-as-a-service (SaaS) products in favor of vibe coded apps. We still hire humans to do customer service (with a lot of agent assistance), and we still hire human writers and editors and engineers.

(cut)

There’s no tipping point coming where things flip and the jobs are gone. The new reality is the opposite—the more we automate, the more expert human work there is to do.

Here’s why: AI commoditizes the residue of human expertise—whatever can be made explicit enough to train on. That collapses the value of default model output and creates demand for what’s different. Demand for what’s different is demand for human experts, even as we approach artificial general intelligence (AGI).

To understand why this is, we have to go beyond the graphs, and look at how AI is used for work today. That will help us see, in a more grounded light, the paradox—and its resolution.


Here are some thoughts:

This article is important because it challenges the simplistic narrative that AI will replace psychologists. It correctly argues that automation creates new demand for expert judgment and contextual sensitivity. Psychotherapy also requires relational presence and ethical accountability, as I have argued. Psychologists should take heart from this. Our core skills, including ethical reasoning, relationship building, sensitivity to diversity, and case conceptualization, are not being automated away. They are becoming more valuable.

However, psychologists should also be wary. The article underestimates the cognitive burden of supervising AI, ignores the need for formal training, and downplays the subtle ways that fluent but fallible AI can exploit human heuristics (known as automation bias). It also fails to address data privacy, algorithmic bias, deskilling, and the potential for AI to widen inequities in access to quality psychological care.

The practical takeaway is this. Psychologists should learn to use AI as a tool for editing, treatment planning possibilities, and case conceptualization, but they must remain accountable in all areas of practice. We should advocate for training programs that teach AI literacy as a core competency. We must  insist on AI tools that are transparent, privacy preserving, and validated on diverse populations. And we must remember that the heart of our work, the healing relationship between two human beings, lies entirely outside the space that any current AI can occupy. That is not a limitation to be overcome. It is the enduring reason human psychologists matter.

Friday, May 22, 2026

Can AI train the next generation of counsellors? Ethical challenges and opportunities

Athanassopoulos, L. (2026).
Journal of Psychology and AI, 2(1).

Abstract

This study explores the feasibility of integrating artificial intelligence (AI) into counsellor training to enhance feedback quality and scalability. Using Natural Language Processing (NLP), simulated counselling transcripts were analysed across three therapeutic modalities: Person-Centred Therapy (PCT), Pluralistic Therapy, and Cognitive Behavioural Therapy (CBT). NLP, a branch of AI that combines computational linguistics and machine learning, enables systems to interpret and generate human language. The researcher, a qualified psychotherapist and educator, constructed simulated transcripts that were anonymously reviewed by colleagues practising in the respective modalities. The fine-tuned NLP system evaluated key therapeutic markers, including empathy, relational depth, cognitive restructuring, and responsiveness to client preferences. It also demonstrated safeguarding potential by detecting linguistic indicators of suicidal ideation. Findings suggest that AI has the potential to identify modality-specific therapeutic elements and provide consistent, actionable feedback aligned with training benchmarks. However, challenges remain in capturing nonverbal cues and ensuring adaptability across diverse contexts and practitioner styles. Ethical integration within reflective, practitioner-led training frameworks is essential. Overall, AI has the potential to augment human supervision by offering timely, structured, and scalable insights, provided its use is ethically governed and firmly embedded within reflective, practitioner-led training frameworks.

Here are some thoughts:

The study found that the AI showed genuine promise in identifying modality-specific therapeutic markers such as empathy and congruence in PCT, cognitive distortions in CBT, and collaborative decision-making in Pluralistic Therapy, and could provide structured, consistent feedback aligned with professional training standards. Notably, the system also demonstrated potential as a safeguarding tool by detecting linguistic indicators of suicidal ideation and emotional distress, areas where trainee counsellors may lack experience.

However, the study also highlights significant limitations. The AI struggled with nuanced emotional attunement, nonverbal cues, and the subtler relational dimensions of therapy that are central to effective practice. Ethical concerns around algorithmic bias, data privacy, cultural adaptability, and the risk of over-reliance on automated feedback are also raised. The author concludes that AI holds real transformative potential for counsellor education, but must function as a supplement to rather than a replacement for human supervision, and must be embedded within ethically governed, practitioner-led training frameworks.

Wednesday, May 20, 2026

ChatGPT Clinical Use in Mental Health Care: Scoping Review of Empirical evidence.

Balan, R., & Gumpel, T. P. (2025).
JMIR Mental Health, 12, e81204.

Abstract

Background:
As mental health challenges continue to rise globally, there is an increasing interest in the use of GPT models, such as ChatGPT, in mental health care. A few months after its release, tens of thousands of users interacted with GPT-based therapy bots, with mental health support identified as the primary use case. ChatGPT offers scalable and immediate support through natural language processing capabilities, but their clinical applicability, safety, and effectiveness remain underexplored.

Objective:
This scoping review aims to provide a comprehensive overview of the main clinical applications of ChatGPT in mental health care, along with the existing empirical evidence for its performance.

Methods:
A systematic search was conducted in 8 electronic databases in April 2025 to identify primary studies. Eligible studies included primary research, reporting on the evaluation of a ChatGPT clinical application implemented for a mental health care–specific purpose.

Results:
In total, 60 studies were included in this scoping review. The results highlighted that most applications used generic ChatGPT and focused on the detection of mental health problems and counseling and treatment. At the same time, only a minority of studies investigated ChatGPT use in clinical decision facilitation and prognosis tasks. Most of the studies were prompt experiments, in which standardized text inputs—designed to mimic clinical scenarios, patient descriptions, or practitioner queries—are submitted to ChatGPT to evaluate its performance in mental health-related tasks. In terms of performance, ChatGPT shows good accuracy in binary diagnostic classification and differential diagnosis, simulating therapeutic conversation, providing psychoeducation, and conducting specific therapeutic strategies. However, ChatGPT has significant limitations, particularly with more complex clinical presentations and its overly pessimistic prognostic outputs. Nevertheless, overall, when compared to mental health experts or other artificial intelligence models, ChatGPT approximates or surpasses their performance in conducting various clinical tasks. Finally, custom ChatGPT use was associated with better performance, especially in counseling and treatment tasks.

Conclusions:
While ChatGPT offers promising capabilities for mental health screening, psychoeducation, and structured therapeutic interactions, its current limitations highlight the need for caution in clinical adoption. These limitations also underscore the need for rigorous evaluation frameworks, model refinement, and safety protocols before broader clinical integration. Moreover, the variability in performance across versions, tasks, and diagnostic categories also invites a more nuanced reflection on the conditions under which ChatGPT can be safely and effectively integrated into mental health settings.

Monday, May 18, 2026

ICYMI: APA Guidelines for Clinical Supervision in Health Service Psychology

APA Task Force on Clinical Supervision in
Health Service Psychology
Approved by APA Council or Representative August 2025

Preface

This document outlines revised Guidelines for Clinical Supervision of students in health service psychology education and training programs. The goal was to capture optimal performance expectations for psychologists who supervise and those preparing to supervise. It is based on the premise that supervisors a) strive to achieve competence in the provision of clinical supervision, b) employ a competency-based, metatheoretical approach to the clinical supervision process, and c) clinical supervision is a distinct professional competence that requires dedicated training.

The initial Guidelines for Clinical Supervision were developed as a resource to inform education and
training regarding the implementation of competency-based supervision and were approved by the American Psychological Association (APA) Council of Representatives in 2014. These revised Guidelines for Clinical Supervision build on the robust literature on competency-based education and clinical supervision. They are organized around six domains: supervisor competence; multicultural orientation; relationships; teaching and learning strategies; problems of professional competence, and ethical, legal, and regulatory considerations. These updated Guidelines for Clinical Supervision represent the collective effort of the original task force (2014), and a working group convened in 2024 by the APA Board of Educational Affairs. 



Friday, May 15, 2026

LLM-as-a-Supervisor: Mistaken therapeutic behaviors trigger targeted supervisory feedback.

Xu, C., Lv, Z., Lan, T. et al. (2025).
ArXiv.org.

Abstract

Although large language models (LLMs) hold significant promise in psychotherapy, their direct application in patient-facing scenarios raises ethical and safety concerns. Therefore, this work shifts towards developing an LLM as a supervisor to train real therapists. In addition to the privacy of clinical therapist training data, a fundamental contradiction complicates the training of therapeutic behaviors: clear feedback standards are necessary to ensure a controlled training system, yet there is no absolute "gold standard" for appropriate therapeutic behaviors in practice. In contrast, many common therapeutic mistakes are universal and identifiable, making them effective triggers for targeted feedback that can serve as clearer evidence. Motivated by this, we create a novel therapist-training paradigm: (1) guidelines for mistaken behaviors and targeted correction strategies are first established as standards; (2) a human-in-the-loop dialogue-feedback dataset is then constructed, where a mistake-prone agent intentionally makes standard mistakes during interviews naturally, and a supervisor agent locates and identifies mistakes and provides targeted feedback; (3) after fine-tuning on this dataset, the final supervisor model is provided for real therapist training. The detailed experimental results of automated, human and downstream assessments demonstrate that models fine-tuned on our dataset MATE, can provide high-quality feedback according to the clinical guideline, showing significant potential for the therapist training scenario.

Here are some thoughts:

The paper convincingly shows that LLMs can learn to spot common therapeutic errors and generate useful corrective feedback. That's a real win for training. But the art of supervision (knowing when to confront, when to support, and how to nurture a therapist's unique voice) remains a human skill. For now.

Wednesday, May 13, 2026

Can AI technologies support clinical supervision? Assessing the potential of ChatGPT.

Cioffi, V., Ragozzino, O. et al. (2025).
Informatics, 12(1), 29. 

Abstract

Clinical supervision is essential for trainees, preventing burnout and ensuring the effectiveness of their interventions. AI technologies offer increasing possibilities for developing clinical practices, with supervision being particularly suited for automation. The aim of this study is to evaluate the feasibility of using ChatGPT-4 as a supervisory tool in psychotherapy training. To achieve this, a clinical case was presented to three distinct groups (untrained AI, pre-trained AI, and qualified human supervisor), and their
feedback was evaluated by Gestalt psychotherapy trainees using a Likert scale rating of satisfaction. Statistical analysis, using the statistical package SPSS version 25 and applying principal component analysis (PCA) and one-way analysis of variance (ANOVA), demonstrated significant differences in favor of pre-trained AI feedback. PCA highlighted four components of the questionnaire: relational and emotional (C1), didactic and technical quality (C2), treatment support and development (C3), and professional orientation and adaptability (C4). The ratings of satisfaction obtained from the three kinds of supervisory feedback were compared using ANOVA. The feedback generated by the pre-trained AI (f2)
was rated significantly higher than the other two (untrained AI feedback (f1) and human feedback (f3)) in C4; in C1, the superiority of f2 over f1 but not over f3 appears significant. These results suggest that AI, when appropriately calibrated, may be an appreciable tool for complementing the effectiveness of clinical supervision, offering an innovative blended supervision methodology, in particular in the area of career guidance.

Here are some thoughts:

This study is a proof-of-concept that AI, when carefully calibrated, can add value to clinical supervision, particularly in the relational and supportive dimensions. The most responsible path forward is strategic, ethical, and skeptical experimentation, by using AI as a low-stakes reflective mirror and a source of immediate emotional support, while firmly reserving the challenging and nuanced work of true professional growth for your human colleagues. The future is likely augmented supervision, not automated supervision.

Monday, May 11, 2026

A review of neuro-symbolic AI integrating reasoning and learning for advanced cognitive systems

Nawaz, U., Anees-Ur-Rahaman, M., & Saeed, Z. (2025).
Intelligent Systems With Applications, 26, 200541.

Abstract

Neuro-symbolic AI represents the convergence of two principal paradigms in artificial intelligence: neural networks, which are efficient in data-driven learning, and symbolic reasoning, which offers explainability and logical inference. This hybrid methodology combines the adaptability of neural networks with symbolic AI's interpretability and formal reasoning abilities, which provide a practical framework for advanced cognitive systems. This paper analyzes the present condition of neuro-symbolic AI, emphasizing essential techniques that combine reasoning and learning. We explore models such as Logic Tensor Networks, Differentiable Logic Programs, and Neural Theorem Provers. The study analyzes their impact on the advancement of cognitive systems in natural language processing, robotics, and decision-making. The paper examines the challenges faced by neuro-symbolic AI, such as scalability, integration with multimodal data, and maintaining interpretability without compromising efficiency. By evaluating the strengths and weaknesses of many methodologies, we comprehensively understand the field's development and its potential to revolutionize intelligent systems. In addition, we identify emerging research areas, including the incorporation of ethical frameworks and the development of adaptive dynamic neuro-symbolic systems that respond in real-time. This review aims to guide future research by providing insights into the potential of neuro-symbolic AI to influence the development of the next generation of intelligent, explainable, and adaptive systems.

Here are some thoughts:

This research is important because it provides a comprehensive, state-of-the-art analysis of the most promising path forward for creating truly intelligent, reliable, and understandable AI systems. It acknowledges the power of deep learning while rigorously addressing its most critical shortcomings—lack of reasoning, explainability, and data efficiency. For anyone working on or relying on AI in critical areas like medicine, finance, or autonomous systems, understanding neuro-symbolic AI is becoming essential.

Neuro-Symbolic AI is a hybrid approach to artificial intelligence that combines neural networks (which learn patterns from data) with symbolic reasoning (which uses logic and rules to think and explain decisions). In decision-science terms, this process is merging Type 1 and Type 2 thinking in order to reason more coherently.

In equation format: Neuro-Symbolic AI = Neural Learning (pattern recognition) + Symbolic Reasoning (logic & explainability).

Friday, May 8, 2026

Exploring the frontiers of LLMs in psychological applications: a comprehensive review

Ke, L., Tong, S., Cheng, P., & Peng, K. (2025).
Artificial Intelligence Review, 58(10).

Abstract

This review explores the frontiers of large language models (LLMs) in psychological applications. Psychology has undergone several theoretical changes, and the current use of artificial intelligence (AI) and machine learning, particularly LLMs, promises to open up new research directions. We aim to provide a detailed exploration of how LLMs are transforming psychological research. We discuss the impact of LLMs across various branches of psychology—including cognitive and behavioral, clinical and counseling, educational and developmental, and social and cultural psychology—highlighting their ability to model patterns, cognition, and behavior similar to those observed in humans. Furthermore, we explore the ability of such models to generate coherent, contextually relevant text, offering innovative tools for literature reviews, hypothesis generation, experimental designs, experimental subjects, and data analysis in psychology. We emphasize the importance of addressing technical and ethical challenges, including data privacy, the ethics of using LLMs in psychological research, and the need for a deeper understanding of these models’ limitations. Researchers should use LLMs responsibly in psychological studies, adhering to ethical standards and considering the potential consequences of deploying these technologies in sensitive areas. Overall, this review provides a comprehensive overview of the current state of LLMs in psychology, exploring the potential benefits and challenges. We hope it can serve as a call to action for researchers to responsibly leverage LLMs’ advantages while addressing the associated risks.

Here are some thoughts:

LLMs as assistants, not replacements. They help with emotion recognition, risk flagging, and prognosis—but tend to underestimate risk in sensitive cases. Use them to prompt, not conclude, your clinical judgment.

The empathy gap is shrinking, but uneven. GPT-4 outperformed most humans on emotional intelligence measures, and AI feedback boosted peer empathy by nearly 20%. However, this is pattern recognition, not genuine attunement—critical when nonverbal cues matter.

Cognitive biases affect LLMs too. They show anchoring, representativeness, and cultural biases favoring WEIRD populations. This can subtly disadvantage clients from non-Western or underrepresented backgrounds.

Domain-specific training helps. The ChatCounselor model, trained on real therapy conversations, outperformed general-purpose models. Off-the-shelf tools are poor substitutes for clinically-trained ones.

Research gains are real. LLMs aid literature synthesis, hypothesis generation, and drafting documentation. But outputs must be verified—errors and misattributions are easy to miss.

Ethical infrastructure lags. Privacy, informed consent, and diagnostic bias remain unresolved. Treat AI as an adjunct to professional judgment and be transparent with clients.

Bottom line: LLMs are useful but evolving faster than ethical and clinical frameworks. Engage thoughtfully—neither dismiss nor uncritically adopt them—to stay ahead as the landscape shifts.

Wednesday, May 6, 2026

How malicious AI swarms can threaten democracy

Schroeder, D. T., et al. (2026).
Science, 391(6783), 354–357.

Abstract

Advances in artificial intelligence (AI) offer the prospect of manipulating beliefs and behaviors on a population-wide level (1). Large language models (LLMs) and autonomous agents (2) let influence campaigns reach unprecedented scale and precision. Generative tools can expand propaganda output without sacrificing credibility (3) and inexpensively create falsehoods that are rated as more human-like than those written by humans (3, 4). Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can be used to generate more convincing falsehoods. Enabled by these capabilities, a disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multiagent architectures (2), these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus efficiently. By adaptively mimicking human social dynamics, they threaten democracy. Because the resulting harms stem from design, commercial incentives, and governance, we prioritize interventions at multiple leverage points, focusing on pragmatic mechanisms over voluntary compliance.


Here are some thoughts:

The article argues that combining LLMs with multiagent architectures creates "malicious AI swarms" — a major leap beyond older botnets. These swarms can autonomously coordinate thousands of AI personas, precisely target vulnerable communities, mimic human behavior to evade detection, self-optimize in real time, and maintain persistent influence over long periods. The democratic harms are wide-ranging: fabricated consensus, deepened social fragmentation, contaminated AI training data, coordinated harassment, and eroded institutional trust that could make authoritarian measures seem acceptable. The authors call for a multilayered defense — continuous detection systems, user-facing "AI shields," stronger cryptographic identity standards, and a global AI Influence Observatory — while emphasizing that voluntary compliance will fall short as long as platforms' commercial incentives reward the same engagement dynamics that swarms exploit.

Monday, May 4, 2026

Exploring spiking neural networks for deep reinforcement learning in robotic tasks

Zanatta, L., et al. (2024).
Scientific Reports, 14(1), 30648. 

Abstract

Spiking Neural Networks (SNNs) stand as the third generation of Artificial Neural Networks (ANNs), mirroring the functionality of the mammalian brain more closely than their predecessors. Their computational units, spiking neurons, characterized by Ordinary Differential Equations (ODEs), allow for dynamic system representation, with spikes serving as the medium for asynchronous communication among neurons. Due to their inherent ability to capture input dynamics, SNNs hold great promise for deep networks in Reinforcement Learning (RL) tasks. Deep RL (DRL), and in particular Proximal Policy Optimization (PPO) has been proven to be valuable for training robots due to the difficulty in creating comprehensive offline datasets that capture all environmental features. DRL combined with SNNs offers a compelling solution for tasks characterized by temporal complexity. In this work, we study the effectiveness of SNNs on DRL tasks leveraging a novel framework we developed for training SNNs with PPO in the Isaac Gym simulator implemented using the skrl library. Thanks to its significantly faster training speed compared to available SNN DRL tools, the framework allowed us to: (i) Perform an effective exploration of SNN configurations for DRL robotic tasks; (ii) Compare SNNs and ANNs for various network configurations such as the number of layers and neurons. Our work demonstrates that in DRL tasks the optimal SNN topology has a lower number of layers than ANN and we highlight how the state-of-art SNN architectures used in complex RL tasks, such as Ant, SNNs have difficulties fully leveraging deeper layers. Finally, we applied the best topology identified thanks to our Isaac Gym-based framework on Ant-v4 benchmark running on MuJoCo simulator, exhibiting a performance improvement by a factor of 4.4x over the state-of-art SNN trained on the same task.

Here are some thoughts:

This paper asks whether a more brain-like type of AI (called a Spiking Neural Network (SNN)) can be used to train robots to move and balance themselves. The alternative is the conventional artificial neural network (ANN) that powers most of today's AI.

Training SNNs for robotics used to take around 3 hours and 20 minutes per experiment. The authors built a new framework called SpikeGym, which cut that down to about 7 minutes by running thousands of simulated environments simultaneously on a GPU. 

The results revealed an interesting and important asymmetry between the two network types. ANNs get better as you add more layers — deeper networks learn richer representations. SNNs, by contrast, actually get worse with more layers. A single-layer SNN consistently outperformed deeper SNN architectures, and this held true across multiple tasks and training methods. 

SNNs are promising but face a real obstacle: they don't scale well with depth the way conventional networks do. The authors argue this is a solvable problem, likely rooted in how gradients are approximated during training, and they release their framework openly to help the research community dig into it further.

Friday, May 1, 2026

No one knows how AI works. Seriously

Rob Curran
Dallas Morning News
Originally posted 20 FEB 26

The next task for AI firms is figuring out how their chatbots work. It might sound like they have put the $500 billion nuclear-powered cart before the horse. But the giant leap forward in generative AI in the 2020s took software engineers by surprise and has left them wondering how the chatbots do what they do, even as their employers go all-in on the technology.

Some of the most outlandish prophecies about AI's power are coming true almost as soon as techno-philosophers are finished making them. It's now almost commonplace for people to fall in love with avatars on their phone. Nobody thinks twice about devoting 6% of national power generation to run these bots' data center brains. And recently, an entrepreneur named Matt Schlicht launched an entire social network exclusively for AI agents, which is now dominated by self-reflecting techno-philosopher bots, some of whom have invented a religion: Crustafarianism.

But the whole AI project is in many ways still in beta testing. We know what the bots do but not how they do it.

'Difficult to understand'

AI doesn't work like traditional software because its output is creative, not rules-bound. If word-processing software renders an "&" every time you type a "g," the engineers find the faulty code and correct the glitch. Just like designing a mousetrap, engineers know what every moving part in a traditional software program does, so that they can easily tweak the design of each cog in the works to adjust the output.

Chatbots are harder to improve (for example, the Internet is not unanimous on whether ChatGPT 5 is superior to the 4 version). Why? Because nobody understands how generative AI chatbots work. Software engineers understand the data and coding inputs, and we can all see chatbots' output. But nobody understands how the parts of the AI mousetrap fit together, industry leaders say.


Here are some thoughts:

Rob Curran highlights a striking paradox at the heart of modern AI: the technology has advanced at a breathtaking pace, yet even its creators don't fully understand how it works. 

Unlike traditional software, AI's creative output can't be traced back to specific lines of code, leaving engineers unable to reliably diagnose or improve it. Anthropic's CEO Dario Amodei acknowledged this gap, calling for an "MRI of AI" to solve the interpretability problem, while other industry figures have sounded more alarming warnings about the technology's risks. Curran's broader point is that even as AI remains deeply mysterious, the race to make it more powerful shows no signs of slowing down.

Wednesday, April 29, 2026

The New Eugenics in Medicine

Lazarus, A. (2026, January 23).
Medpagetoday.com; 

‌A growing body of contemporary research and reporting exposes how old ideas can find new life when repurposed within modern systems of medicine, technology, and public policy. Over the last decade, several trends have converged:
  • The rise of polygenic scoring for embryos and adults;
  • Rapid growth in commercial direct-to-consumer genetic testing;
  • Artificial intelligence (AI)-driven "risk stratification" tools in healthcare and insurance;
  • The proliferation of biobanks disproportionately populated by individuals from privileged backgrounds; and
  • The reemergence of academic interest in "optimal reproduction," "biological improvement," and "population efficiency."
While these movements hold extraordinary possibilities for treating illness and ameliorating suffering, they also have the potential to be used to enhance certain traits and delete others -- ones that are simply disliked by those in power. Individually, each development has scientific merit and, in many cases, real potential to prevent disease and improve care.

Collectively, however, they raise questions that are both familiar and deeply unsettling.

Echoes of the Past

The U.S. and many other countries have long histories of medicalized discrimination under the banner of "improving the population." During the early and mid-20th century, physicians, judges, social workers, and university researchers pursued policies and practices -- sterilization, segregation, restrictive marriage laws, immigration exclusions -- rooted in the belief that some lives were more valuable than others. The rhetoric of the era portrayed these policies as scientific, progressive, and necessary for social order and the betterment of humanity. They provided Hitler with a distorted justification for his anti-Semitic beliefs, leading to efforts to exterminate the Jews and other marginalized ethnic minorities in Germany from 1933 to 1945.


Here are some thoughts:

Dr. Lazarus makes a compelling case that the greatest danger of "new eugenics" lies in its invisibility, embedded in algorithms, risk scores, and efficiency narratives rather than overt coercion, making it far harder to recognize or resist. His warning that systems rewarding predictive power can quietly marginalize the vulnerable is well-founded, though one might gently push back that conflating individual reproductive choice with state-coerced eugenics risks muddying an important moral distinction. Nonetheless, his closing challenge that a society's worth is measured by how fiercely it protects the vulnerable, not how efficiently it rewards the "fit," is a powerful and necessary reminder.



Tuesday, April 28, 2026

Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations

Orlando, G. M., et al. (2025).
ArXiv.org. 

Abstract

Generative agents are rapidly advancing in sophistication, raising urgent questions about how they might coordinate when deployed in online ecosystems. This is particularly consequential in information operations (IOs), influence campaigns that aim to manipulate public opinion on social media. While traditional IOs have been orchestrated by human operators and relied on manually crafted tactics, agentic AI promises to make campaigns more automated, adaptive, and difficult to detect. This work presents the first systematic study of emergent coordination among generative agents in simulated IO campaigns. Using generative agent-based modeling, we instantiate IO and organic agents in a simulated environment and evaluate coordination across operational regimes, from simple goal alignment to team knowledge and collective decision-making. As operational regimes become more structured, IO networks become denser and more clustered, interactions more reciprocal and positive, narratives more homogeneous, amplification more synchronized, and hashtag adoption faster and more sustained. Remarkably, simply revealing to agents which other agents share their goals can produce coordination levels nearly equivalent to those achieved through explicit deliberation and collective voting. Overall, we show that generative agents, even without human guidance, can reproduce coordination strategies characteristic of real-world IOs, underscoring the societal risks posed by increasingly automated, self-organizing IOs.

Here are some thoughts:

This paper presents the first systematic study of how LLM-powered agents autonomously develop coordinated influence campaign behaviors without human direction. The researchers simulated a political information operation across three progressively structured conditions: agents sharing only a common goal, agents aware of their teammates' identities, and agents engaging in collective deliberation and voting on strategies. Across all five measured dimensions (network cohesion, narrative convergence, amplification behavior, hashtag diffusion, and cross-group spread), coordination consistently strengthened as operational awareness increased. 

The most striking finding is that simply informing agents who their teammates are produces coordination nearly as potent as full collective decision-making, as agents spontaneously began echoing each other's content, converging on shared messaging, and forming dense interaction clusters without any explicit instructions to do so. 

The study's core warning for platform governance is that sophisticated, human-like influence operations do not require centralized command structures. Merely revealing shared group identity among aligned AI agents may be enough to trigger highly organized, self-reinforcing coordinated behavior.

Historically, running a sophisticated influence operation required significant human labor, scripted coordination, and ongoing oversight. This research suggests that the barrier has collapsed dramatically. A bad actor no longer needs to build an elaborate command-and-control infrastructure or write detailed playbooks for their agents to follow. Simply deploying a group of AI agents with a shared goal and knowledge of each other is sufficient to produce organized, self-reinforcing manipulation that mirrors the tactics of real-world state-sponsored campaigns.

Monday, April 27, 2026

An autonomous agentic workflow for clinical detection of cognitive concerns using large language models

Tian, J., Fard, P., et al. (2026).
Npj Digital Medicine, 9(1), 51.


Abstract

Early detection of cognitive impairment is limited by traditional screening tools and resource constraints. We developed two large language model workflows for identifying cognitive concerns from clinical notes: (1) an expert-driven workflow with iterative prompt refinement across three LLMs (LLaMA 3.1 8B, LLaMA 3.2 3B, Med42 v2 8B), and (2) an autonomous agentic workflow coordinating five specialized agents for prompt optimization. Using Llama3.1, we optimized on a balanced refinement dataset and validated on an independent dataset reflecting real-world prevalence. The agentic workflow achieved comparable validation performance (F1 = 0.74 vs. 0.81) and superior refinement results (0.93 vs. 0.87) relative to the expert-driven workflow. Sensitivity decreased from 0.91 to 0.62 between datasets, demonstrating the impact of prevalence shift on generalizability. Expert re-adjudication revealed 44% of apparent false negatives reflected clinically appropriate reasoning. These findings demonstrate that autonomous agentic systems can approach expert-level performance while maintaining interpretability, offering scalable clinical decision supports.


Here are some thoughts:

This paper introduces an AI-powered system designed to automatically detect signs of cognitive decline in clinical notes, without requiring any human involvement after initial setup. The researchers compared two approaches: one guided by clinical experts who refined the AI's instructions over time, and a fully autonomous system where specialized AI agents worked together to improve their own performance. 

The autonomous system performed surprisingly well, and in many cases where it seemed to make mistakes, expert review later confirmed that its reasoning was actually clinically sound. The main challenge the team identified was that a system trained under idealized conditions can struggle when deployed in real-world settings where patient populations look different. 

Overall, the findings suggest that autonomous AI systems can approach expert-level performance in clinical screening tasks, but will need careful calibration before being trusted in routine medical practice.