Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Wednesday, February 5, 2025

Ethical debates amidst flawed healthcare artificial intelligence metrics

Gallifant, J., et al. (2024).
Npj Digital Medicine, 7(1).

Healthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial. To address these fundamental shortcomings, a paradigm shift in AI assessment is needed, prioritizing actual patient outcomes over conventional benchmarking.

Artificial intelligence (AI) is poised to bridge the deployment gap with increasing capabilities for remote patient monitoring, handling of diverse time series datasets, and progression toward the promise of precision medicine. This proximity also underscores the urgency of confronting the translational risks accompanying this technological evolution and maximizing alignment with fundamental principles of ethical, equitable, and effective deployment. The recent work by Goetz et al. surfaces a critical issue at the intersection of technology and healthcare ethics: the challenge of generalization and fairness in health AI applications1. This is a complex issue where equal performance across subgroups can be at odds with overall performance metrics2.

Specifically, it highlights one potential avenue to navigate variation in model performance among subgroups based on the concept of “selective deployment”3. This strategy asserts that limiting the deployment of the technology to the subgroup in which it works well facilitates benefits for those subpopulations. The alternative is not to deploy the technology in the optimal performance group but instead adopt a standard of equity in the performance overall to achieve parity among subgroups, what might be termed “equitable deployment”. Some view this as a requirement to “level down” performance for the sake of equity, a view that is not unique to AI or healthcare and is the subject of a broader ethical debate4,5,6. Proponents of equitable deployment would counter: Can a commitment to fairness justify not deploying a technology that is likely to be effective but only for a specific subpopulation?


Here are some thoughts:

The article explores the intricate ethical dilemmas surrounding the deployment of AI in healthcare, particularly the tension between selective and equitable deployment. Selective deployment involves using AI in specific cases where it performs best, potentially maximizing benefits for those groups but risking health disparities for others. Equitable deployment, on the other hand, seeks to ensure fairness across all patient groups, which might require accepting lower performance in certain areas to avoid exacerbating inequalities. The challenge lies in balancing these approaches, as what is effective for one group may not be so for another.

Flawed performance metrics are highlighted as a significant issue, as they may not capture real-world complexities and biases. This can lead to premature assertions of AI effectiveness, where systems are deployed based on metrics that look good in tests but fail in practical settings. The article emphasizes the need for improved evaluation practices, such as continuous monitoring and silent evaluation periods, to ensure AI systems perform well in diverse and dynamic healthcare environments.

A paradigm shift is called for, prioritizing actual patient outcomes over conventional benchmarking. This approach recognizes that patient care is influenced by numerous factors beyond just AI performance. The potential of AI to bridge the deployment gap, through capabilities like remote patient monitoring and precision medicine, is exciting but also underscores the need for caution in addressing ethical risks.

Generalization and fairness in AI applications are critical, as ensuring effectiveness across different subgroups is challenging. The concept of selective deployment, while beneficial for specific groups, could disadvantage others. Equitable deployment, aiming for parity among subgroups, may require balancing effectiveness and equality, a complex task influenced by social and political factors in healthcare.

The article underscores the importance of addressing "bias exhaust," or residual biases in AI models stemming from systemic healthcare issues, to develop fair AI systems. Distinguishing between acceptable variability in medical conditions and impermissible bias is essential, as is continuous evaluation to monitor AI performance in real-world settings.