Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy
Showing posts with label reliability. Show all posts
Showing posts with label reliability. Show all posts

Saturday, May 25, 2024

AI Chatbots Will Never Stop Hallucinating

Lauren Leffer
Scientific American
Originally published 5 April 24

Here is an excerpt:

Hallucination is usually framed as a technical problem with AI—one that hardworking developers will eventually solve. But many machine-learning experts don’t view hallucination as fixable because it stems from LLMs doing exactly what they were developed and trained to do: respond, however they can, to user prompts. The real problem, according to some AI researchers, lies in our collective ideas about what these models are and how we’ve decided to use them. To mitigate hallucinations, the researchers say, generative AI tools must be paired with fact-checking systems that leave no chatbot unsupervised.

Many conflicts related to AI hallucinations have roots in marketing and hype. Tech companies have portrayed their LLMs as digital Swiss Army knives, capable of solving myriad problems or replacing human work. But applied in the wrong setting, these tools simply fail. Chatbots have offered users incorrect and potentially harmful medical advice, media outlets have published AI-generated articles that included inaccurate financial guidance, and search engines with AI interfaces have invented fake citations. As more people and businesses rely on chatbots for factual information, their tendency to make things up becomes even more apparent and disruptive.

But today’s LLMs were never designed to be purely accurate. They were created to create—to generate—says Subbarao Kambhampati, a computer science professor who researches artificial intelligence at Arizona State University. “The reality is: there’s no way to guarantee the factuality of what is generated,” he explains, adding that all computer-generated “creativity is hallucination, to some extent.”


Here is my summary:

AI chatbots like ChatGPT and Bing's AI assistant frequently "hallucinate" - they generate false or misleading information and present it as fact. This is a major problem as more people turn to these AI tools for information, research, and decision-making.

Hallucinations occur because AI models are trained to predict the most likely next word or phrase, not to reason about truth and accuracy. They simply produce plausible-sounding responses, even if they are completely made up.

This issue is inherent to the current state of large language models and is not easily fixable. Researchers are working on ways to improve accuracy and reliability, but there will likely always be some rate of hallucination.

Hallucinations can have serious consequences when people rely on chatbots for sensitive information related to health, finance, or other high-stakes domains. Experts warn these tools should not be used where factual accuracy is critical.

Monday, November 22, 2021

Revisiting Daubert: Judicial Gatekeeping and Expert Ethics in Court

Young, G., Goodman-Delahunty, J.
Psychol. Inj. and Law (2021). 
https://doi.org/10.1007/s12207-021-09428-8

Abstract

This article calls for pragmatic modifications to legal practices for the admissibility of scientific evidence, including forensic psychological science. We submit that Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993) and the other two cases in the U.S. Supreme Court trilogy on expert evidence have largely failed to accomplish their gatekeeping goals to assure the reliability of scientific evidence admitted in court. Reliability refers to validity in psychological terms. Part of the problem with Daubert’s application in court is the gatekeeping function that it ascribes to judges. Most Daubert admissibility challenges are rejected by judges, who might lack the requisite scientific expertise to make informed decisions; educating judges on science might not be an adequate solution. Like others who have put forth the idea, pursuant to Federal Rule of Evidence (FRE) 706, we suggest that court-appointed impartial experts can help judges to adjudicate competing claims on admissibility. We further recommend that an expert witness ethics code sworn to in legal proceedings should be mandatory in all jurisdictions. The journal Psychological Injury and Law calls for comments and further recommendations on modifying Daubert admissibility challenges and procedures in civil and criminal cases to develop best practices to mitigate adversarial allegiance and other unconscious biases in expert decision-making.

Advantages of an Expert Witness Ethics Code Sworn to in Legal Proceedings

We suggest that in the field of psychological injury, jurisdictions in which courts reinforce expert obligations via an ethics code for expert witnesses will lead to more balanced and impartial testimony. The essential principle guiding a science-based expert witness ethics code sworn to in legal proceedings is that the process of forensic assessment, as well as the subsequent proffer of testimony in court based on those assessments, should account for all the reliable evidence gathered in a particular case as determined by methodologies informed by scientific research in the relevant field.  This recommendation is in line with psychological research showing that expert bias is reduced when experts do not focus on a single question or hypothesis, but address a “line up” of competing and alternative conclusions and hypotheses (Dror, 2020). The components of the expert witness oath, like the appointment of a court-appointed expert, encourage experts to adopt a differential diagnosis approach, in which all different conclusions and their probability are presented, rather than one conclusion (Dror, 2020). Opinions, interpretations, and conclusions based on the data, information, and evidence will more likely be impartial, fully scientifically informed, and just.

Sunday, July 26, 2020

The trolley problem problem

James Wilson
aeon.com
Originally posted 20 May 20

Here is an excerpt:

Some philosophers think that ethical thought experiments either are, or have a strong affinity with, scientific experiments. On such a view, thought experiments, like other experiments, when well-designed can allow knowledge to be built via rigorous and unbiased testing of hypotheses. Just as in the randomised controlled trials in which new pharmaceuticals are tested, the circumstances and the types of control in thought experiments could be such as to make the situation very unlike everyday situations, but that is a virtue rather than a vice, insofar as it allows ethical hypotheses to be tested cleanly and rigorously.

If thought experiments are – literally – experiments, this helps to explain how they might provide insights into the way the world is. But it would also mean that thought experiments would inherit the two methodological challenges that attend to experiments more generally, known as internal and external validity. Internal validity relates to the extent to which an experiment succeeds in providing an unbiased test of the variable or hypothesis in question. External validity relates to the extent to which the results in the controlled environment translate to other contexts, and in particular to our own. External validity is a major challenge, as the very features that make an environment controlled and suitable to obtain internal validity often make it problematically different from the uncontrolled environments in which interventions need to be applied.

There are significant challenges with both the internal and the external validity of thought experiments. It is useful to compare the kind of care with which medical researchers or psychologists design experiments – including validation of questionnaires, double-blinding of trials, placebo control, power calculations to determine the cohort size required and so on – with the typically rather more casual approach taken by philosophers. Until recently, there has been little systematic attempt within normative ethics to test variations of different phrasing of thought experiments, or to think about framing effects, or sample sizes; or the extent to which the results from the thought experiment are supposed to be universal or could be affected by variables such as gender, class or culture. A central ambiguity has been whether the implied readers of ethical thought experiments should be just anyone, or other philosophers; and, as a corollary, whether judgments elicited are supposed to be expert judgments, or the judgments of ordinary human beings. As the vast majority of ethical thought experiments in fact remain confined to academic journals, and are tested only informally on other philosophers, de facto they are tested only on those with expertise in the construction of ethical theories, rather than more generally representative samples or those with expertise in the contexts that the thought experiments purport to describe.

The info is here.

Wednesday, December 4, 2019

AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department of Defense

Image result for AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department of DefenseDepartment of Defense
Defense Innovation Board
Published November 2019

Here is an excerpt:

What DoD is Doing to Establish an Ethical AI Culture

DoD’s “enduring mission is to provide combat-credible military forces needed to deter war and protect the security of our nation.” As such, DoD seeks to responsibly integrate and leverage AI across all domains and mission areas, as well as business administration, cybersecurity, decision support, personnel, maintenance and supply, logistics, healthcare, and humanitarian programs. Notably, many AI use cases are non-lethal in nature. From making battery fuel cells more efficient to predicting kidney disease in our veterans to managing fraud in supply chain management, AI has myriad applications throughout the Department.

DoD is mission-oriented, and to complete its mission, it requires access to cutting edge technologies to support its warfighters at home and abroad. These technologies, however, are only one component to fulfilling its mission. To ensure the safety of its personnel, to comply with the Law of War, and to maintain an exquisite professional force, DoD maintains and abides by myriad processes, procedures, rules, and laws to guide its work.  These are buttressed by DoD’s strong commitment to the following values: leadership, professionalism, and technical knowledge through the dedication to duty, integrity, ethics, honor, courage, and loyalty. As DoD utilizes AI in its mission, these values ground, inform,
and sustain the AI Ethics Principles.

As DoD continues to comply with existing policies, processes, and procedures, as well as to
create new opportunities for responsible research and innovation in AI, there are several
cases where DoD is beginning to or already engaging in activities that comport with the
calls from the DoD AI Strategy and the AI Ethics Principles enumerated here.

The document is here.

Monday, June 18, 2018

Groundhog Day for Medical Artificial Intelligence

Alex John London
The Hastings Report
Originally published May 26, 2018

Abstract

Following a boom in investment and overinflated expectations in the 1980s, artificial intelligence entered a period of retrenchment known as the “AI winter.” With advances in the field of machine learning and the availability of large datasets for training various types of artificial neural networks, AI is in another cycle of halcyon days. Although medicine is particularly recalcitrant to change, applications of AI in health care have professionals in fields like radiology worried about the future of their careers and have the public tittering about the prospect of soulless machines making life‐and‐death decisions. Medicine thus appears to be at an inflection point—a kind of Groundhog Day on which either AI will bring a springtime of improved diagnostic and predictive practices or the shadow of public and professional fear will lead to six more metaphorical weeks of winter in medical AI.

The brief perspective is here.

Monday, June 11, 2018

Can Morality Be Engineered In Artificial General Intelligence Systems?

Abhijeet Katte
Analytics India Magazine
Originally published May 10, 2018

Here is an excerpt:

This report Engineering Moral Agents – from Human Morality to Artificial Morality discusses challenges in engineering computational ethics and how mathematically oriented approaches to ethics are gaining traction among researchers from a wide background, including philosophy. AGI-focused research is evolving into the formalization of moral theories to act as a base for implementing moral reasoning in machines. For example, Kevin Baum from the University of Saarland talked about a project about teaching formal ethics to computer-science students wherein the group was involved in building a database of moral-dilemma examples from the literature to be used as benchmarks for implementing moral reasoning.

Another study, titled Towards Moral Autonomous Systems from a group of European researchers states that today there is a real need for a functional system of ethical reasoning as AI systems that function as part of our society are ready to be deployed.One of the suggestions include having every assisted living AI system to have  a “Why did you do that?” button which, when pressed, causes the robot to explain why it carried out the previous action.

The information is here.

Tuesday, March 14, 2017

“I placed too much faith in underpowered studies:” Nobel Prize winner admits mistakes

Retraction Watch
Originally posted February 21, 2017

Although it’s the right thing to do, it’s never easy to admit error — particularly when you’re an extremely high-profile scientist whose work is being dissected publicly. So while it’s not a retraction, we thought this was worth noting: A Nobel Prize-winning researcher has admitted on a blog that he relied on weak studies in a chapter of his bestselling book.

The blog — by Ulrich Schimmack, Moritz Heene, and Kamini Kesavan — critiqued the citations included in a book by Daniel Kahneman, a psychologist whose research has illuminated our understanding of how humans form judgments and make decisions and earned him half of the 2002 Nobel Prize in Economics.

The article is here.

Friday, July 1, 2016

Predicting Suicide is not Reliable, according to recent study

Matthew Large , M. Kaneson, N. Myles, H. Myles, P. Gunaratne, C. Ryan
PLOS One
Published: June 10, 2016
http://dx.doi.org/10.1371/journal.pone.0156322

Discussion

The pooled estimate from a large and representative body of research conducted over 40 years suggests a statistically strong association between high-risk strata and completed suicide. However the meta-analysis of the sensitivity of suicide risk categorization found that about half of all suicides are likely to occur in lower-risk groups and the meta-analysis of PPV suggests that 95% of high-risk patients will not suicide. Importantly, the pooled odds ratio (and the estimates of the sensitivity and PPV) and any assessment of the overall strength of risk assessment should be interpreted very cautiously in the context of several limitations documented below.

With respect to our first hypothesis, the statistical estimates of between study heterogeneity and the distribution of the outlying, quartile and median effect sizes values suggests that the statistical strength of suicide risk assessment cannot be considered to be consistent between studies, potentially limiting the generalizability of the pooled estimate.

With respect to our second hypothesis we found no evidence that the statistical strength of suicide risk assessment has improved over time.

The research is here.

Thursday, June 2, 2016

Scientific consent, data, and doubling down on the internet

Oliver Keyes
Originally published May 12, 2016

Here is an excerpt:

The Data

Yesterday morning I woke up to a Twitter friend pointing me to a release of OKCupid data, by Kirkegaard. Having now spent some time exploring the data, and reading both public statements on the work and the associated paper: this is without a doubt one of the most grossly unprofessional, unethical and reprehensible data releases I have ever seen.

There are two reasons for that. The first is very simple; Kirkegaard never asked anyone. He didn't ask OKCupid, he didn't ask the users covered by the dataset - he simply said 'this is public so people should expect it's going to be released'.

The blog post is here.

Sunday, January 18, 2015

Why the Myers-Briggs test is totally meaningless

By Joseph Stromberg
Vox
Published on January 5, 2015

The Myers-Briggs Type Indicator is probably the most widely used personality test in the world.

An estimated 2 million people take it annually, at the behest of corporate HR departments, colleges, and even government agencies. The company that makes and markets the test makes somewhere around $20 million each year.

The only problem? The test is completely meaningless.

"There's just no evidence behind it," says Adam Grant, an organizational psychologist at the University of Pennsylvania who's written about the shortcomings of the Myers-Briggs previously. "The characteristics measured by the test have almost no predictive power on how happy you'll be in a situation, how you'll perform at your job, or how happy you'll be in your marriage."

The entire article is here.

Saturday, December 21, 2013

Ethical Considerations in the Development and Application of Mental and Behavioral Nosologies: Lessons from DSM-5

By Robert M. Gordon and Lisa Cosgrove
Psychological Injury and Law
10.1007/s12207-013-9172-9
December 13, 2013

Abstract

We are not likely to find a diagnostic system as “unethical,” per se, but rather find that it creates ethical concerns in its formulation and application. There is an increased risk of misuse and misunderstanding of the DSM-5 particularly when applied to forensic assessment because of documented problems with reliability and validity. For example, when field tested, the American Psychiatric Association reported diagnostic category kappa levels as acceptable that were far below the standard level of acceptability. The DSM-5 does not offer sensitivity and specificity levels and thus psychologists must keep this in mind when using or teaching this manual. Also, especially in light of concerns about diagnostic inflation, we recommend that psychologists exercise caution when using the DSM-5 in forensic assessments, including civil and criminal cases. Alternatives to the DSM-5, such as the International Classification of Diseases and the Psychodynamic Diagnostic Manual are reviewed.

Here is an excerpt:

It should be emphasized that ethical concerns about DSM-5 panel members having commercial ties is not meant in any way to imply that any task force or work group member intentionally made pro- industry decisions. Decades of research have demonstrated that cognitive biases are commonplace and very difficult to eradicate, and more recent studies suggest that disclosure of financial conflicts of interest may actually worsen bias (Dana & Lowenstein, 2003). This is because bias is most often manifested in subtle ways unbeknownst to the researcher or clinician, and thus is usually implicit and unintentional. Physicians—like everyone else—have ethical blind spots. Social scientists have documented the fact that physicians often fail to recognize their vulnerability to commercial interests because they mistakenly believe that they are immune to marketing and industry influence (Sah & Faugh-Burman, 2013).

The entire article is here.

Tuesday, May 21, 2013

DSM-IV Boss Presses Attack on New Revision

By John Gever, Deputy Managing Editor
MedPage Today
Published: May 17, 2013

A new edition of psychiatry's diagnostic guide "will probably lead to substantial false-positive rates and unnecessary treatment," charged the man who led development of the last version.

To be released this weekend at the American Psychiatric Association's annual meeting, the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders, or DSM-5, "introduce[s] several high-prevalence diagnoses at the fuzzy boundary with normality," according to Allen Frances, MD, who chaired the task force responsible for DSM-IV issued in 1994.

Frances, now an emeritus professor at Duke University, wrote online in Annals of Internal Medicine that changes from DSM-IV will apply disease labels to individuals who may be unhappy or offensive but still normal. Such individuals would include those experiencing "the forgetfulness of old age" as well as children with severe, chronic temper tantrums and individuals with physical symptoms with no medical explanation.

He also worried about new marketing pushes from the pharmaceutical industry seeking to exploit what he believes are "loose" diagnostic criteria in the new edition. "Drug companies take marketing advantage of the loose DSM definitions by promoting the misleading idea that everyday life problems are actually undiagnosed psychiatric illness caused by a chemical imbalance and requiring a solution in pill form," he wrote.

The entire article is here.

Wednesday, April 11, 2012

Can Most Cancer Research Be Trusted?

Addressing the problem of "academic risk" in biomedical research

By Ronald Bailey
reason.com
Originally published April 3, 2012

When a cancer study is published in a prestigious peer-reviewed journal, the implcation is the findings are robust, replicable, and point the way toward eventual treatments. Consequently, researchers scour their colleagues' work for clues about promising avenues to explore. Doctors pore over the pages, dreaming of new therapies coming down the pike. Which makes a new finding that nine out of 10 preclinical peer-reviewed cancer research studies cannot be replicated all the more shocking and discouraging.

Last week, the scientific journal Nature published a disturbing commentary claiming that in the area of preclinical research—which involves experiments done on rodents or cells in petri dishes with the goal of identifying possible targets for new treatments in people—independent researchers doing the same experiment cannot get the same result as reported in the scientific literature.

The entire commentary is here.

Thanks to Rich Ievoli for the story.  He could have been a contender.