Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy
Showing posts with label Accuracy. Show all posts
Showing posts with label Accuracy. Show all posts

Saturday, May 25, 2024

AI Chatbots Will Never Stop Hallucinating

Lauren Leffer
Scientific American
Originally published 5 April 24

Here is an excerpt:

Hallucination is usually framed as a technical problem with AI—one that hardworking developers will eventually solve. But many machine-learning experts don’t view hallucination as fixable because it stems from LLMs doing exactly what they were developed and trained to do: respond, however they can, to user prompts. The real problem, according to some AI researchers, lies in our collective ideas about what these models are and how we’ve decided to use them. To mitigate hallucinations, the researchers say, generative AI tools must be paired with fact-checking systems that leave no chatbot unsupervised.

Many conflicts related to AI hallucinations have roots in marketing and hype. Tech companies have portrayed their LLMs as digital Swiss Army knives, capable of solving myriad problems or replacing human work. But applied in the wrong setting, these tools simply fail. Chatbots have offered users incorrect and potentially harmful medical advice, media outlets have published AI-generated articles that included inaccurate financial guidance, and search engines with AI interfaces have invented fake citations. As more people and businesses rely on chatbots for factual information, their tendency to make things up becomes even more apparent and disruptive.

But today’s LLMs were never designed to be purely accurate. They were created to create—to generate—says Subbarao Kambhampati, a computer science professor who researches artificial intelligence at Arizona State University. “The reality is: there’s no way to guarantee the factuality of what is generated,” he explains, adding that all computer-generated “creativity is hallucination, to some extent.”


Here is my summary:

AI chatbots like ChatGPT and Bing's AI assistant frequently "hallucinate" - they generate false or misleading information and present it as fact. This is a major problem as more people turn to these AI tools for information, research, and decision-making.

Hallucinations occur because AI models are trained to predict the most likely next word or phrase, not to reason about truth and accuracy. They simply produce plausible-sounding responses, even if they are completely made up.

This issue is inherent to the current state of large language models and is not easily fixable. Researchers are working on ways to improve accuracy and reliability, but there will likely always be some rate of hallucination.

Hallucinations can have serious consequences when people rely on chatbots for sensitive information related to health, finance, or other high-stakes domains. Experts warn these tools should not be used where factual accuracy is critical.

Friday, May 10, 2024

Generative artificial intelligence and scientific publishing: urgent questions, difficult answers

J. Bagenal
The Lancet
March 06, 2024

Abstract

Azeem Azhar describes, in Exponential: Order and Chaos in an Age of Accelerating Technology, how human society finds it hard to imagine or process exponential growth and change and is repeatedly caught out by this phenomenon. Whether it is the exponential spread of a virus or the exponential spread of a new technology, such as the smartphone, people consistently underestimate its impact.  Whether it is the exponential spread of a virus or the exponential spread of a new technology, such as the smartphone, people consistently underestimate its impact. Azhar argues that an exponential gap has developed between technological progress and the pace at which institutions are evolving to deal with that progress. This is the case in scientific publishing with generative artificial intelligence (AI) and large language models (LLMs). There is guidance on the use of generative AI from organisations such as the International Committee of Medical Journal Editors. But across scholarly publishing such guidance is inconsistent. For example, one study of the 100 top global academic publishers and scientific journals found only 24% of academic publishers had guidance on the use of generative AI, whereas 87% of scientific journals provided such guidance. For those with guidance, 75% of publishers and 43% of journals had specific criteria for the disclosure of use of generative AI. In their book The Coming Wave, Mustafa Suleyman, co-founder and CEO of Inflection AI, and writer Michael Bhaskar warn that society is unprepared for the changes that AI will bring. They describe a person's or group's reluctance to confront difficult, uncertain change as the “pessimism aversion trap”. For journal editors and scientific publishers today, this is a dangerous trap to fall into. All the signs about generative AI in scientific publishing suggest things are not going to be ok.


From behind the paywall.

In 2023, Springer Nature became the first scientific publisher to create a new academic book by empowering authors to use generative Al. Researchers have shown that scientists found it difficult to distinguish between a human generated scientific abstract and one created by generative Al. Noam Chomsky has argued that generative Al undermines education and is nothing more than high-tech plagiarism, and many feel similarly about Al models trained on work without upholding copyright. Plagiarism is a problem in scientific publishing, but those concerned with research integrity are also considering a post- plagiarism world, in which hybrid human-Al writing becomes the norm and differentiating between the two becomes pointless. In the ideal scenario, human creativity is enhanced, language barriers disappear, and humans relinquish control but not responsibility.  Such an ideal scenario would be good.  But there are two urgent questions for scientific publishing.

First, how can scientific publishers and journal editors assure themselves that the research they are seeing is real? Researchers have used generative Al to create convincing fake clinical trial datasets to support a false scientific hypothesis that could only be identified when the raw data were scrutinised in detail by an expert. Papermills (nefarious businesses that generate poor or fake scientific studies and sell authorship) are a huge problem and contribute to the escalating number of research articles that are retracted by scientific publishers. The battle thus far has been between papermills becoming more sophisticated in their fabrication and ways of manipulating the editorial process and scientific publishers trying to find ways to detect and prevent these practices. Generative Al will turbocharge that race, but it might also break the papermill business model. When rogue academics use generative Al to fabricate datasets, they will not need to pay a papermill and will generate sham papers themselves. Fake studies will exponentially surge and nobody is doing enough to stop this inevitability.

Wednesday, July 19, 2023

Accuracy and social motivations shape judgements of (mis)information

Rathje, S., Roozenbeek, J., Van Bavel, J.J. et al.
Nat Hum Behav 7, 892–903 (2023).

Abstract

The extent to which belief in (mis)information reflects lack of knowledge versus a lack of motivation to be accurate is unclear. Here, across four experiments (n = 3,364), we motivated US participants to be accurate by providing financial incentives for correct responses about the veracity of true and false political news headlines. Financial incentives improved accuracy and reduced partisan bias in judgements of headlines by about 30%, primarily by increasing the perceived accuracy of true news from the opposing party (d = 0.47). Incentivizing people to identify news that would be liked by their political allies, however, decreased accuracy. Replicating prior work, conservatives were less accurate at discerning true from false headlines than liberals, yet incentives closed the gap in accuracy between conservatives and liberals by 52%. A non-financial accuracy motivation intervention was also effective, suggesting that motivation-based interventions are scalable. Altogether, these results suggest that a substantial portion of people’s judgements of the accuracy of news reflects motivational factors.

Conclusions

There is a sizeable partisan divide in the kind of news liberals and conservatives believe in, and conservatives tend to believe in and share more false news than liberals. Our research suggests these differences are not immutable. Motivating people to be accurate improves accuracy about the veracity of true (but not false) news headlines, reduces partisan bias and closes a substantial portion of the gap in accuracy between liberals and conservatives. Theoretically, these results identify accuracy and social motivations as key factors in driving news belief and sharing. Practically, these results suggest that shifting motivations may be a useful strategy for creating a shared reality across the political spectrum.

Key findings
  • Accuracy motivations: Participants who were motivated to be accurate were more likely to correctly identify true and false news headlines.
  • Social motivations: Participants who were motivated to identify news that would be liked by their political allies were less likely to correctly identify true and false news headlines.
  • Combination of motivations: Participants who were motivated by both accuracy and social motivations were more likely to correctly identify true news headlines from the opposing political party.

Tuesday, May 30, 2023

Are We Ready for AI to Raise the Dead?

Jack Holmes
Esquire Magazine
Originally posted 4 May 24

Here is an excerpt:

You can see wonderful possibilities here. Some might find comfort in hearing their mom’s voice, particularly if she sounds like she really sounded and gives the kind of advice she really gave. But Sandel told me that when he presents the choice to students in his ethics classes, the reaction is split, even as he asks in two different ways. First, he asks whether they’d be interested in the chatbot if their loved one bequeathed it to them upon their death. Then he asks if they’d be interested in building a model of themselves to bequeath to others. Oh, and what if a chatbot is built without input from the person getting resurrected? The notion that someone chose to be represented posthumously in a digital avatar seems important, but even then, what if the model makes mistakes? What if it misrepresents—slanders, even—the dead?

Soon enough, these questions won’t be theoretical, and there is no broad agreement about whom—or even what—to ask. We’re approaching a more fundamental ethical quandary than we often hear about in discussions around AI: human bias embedded in algorithms, privacy and surveillance concerns, mis- and disinformation, cheating and plagiarism, the displacement of jobs, deepfakes. These issues are really all interconnected—Osama bot Laden might make the real guy seem kinda reasonable or just preach jihad to tweens—and they all need to be confronted. We think a lot about the mundane (kids cheating in AP History) and the extreme (some advanced AI extinguishing the human race), but we’re more likely to careen through the messy corridor in between. We need to think about what’s allowed and how we’ll decide.

(cut)

Our governing troubles are compounded by the fact that, while a few firms are leading the way on building these unprecedented machines, the technology will soon become diffuse. More of the codebase for these models is likely to become publicly available, enabling highly talented computer scientists to build their own in the garage. (Some folks at Stanford have already built a ChatGPT imitator for around $600.) What happens when some entrepreneurial types construct a model of a dead person without the family’s permission? (We got something of a preview in April when a German tabloid ran an AI-generated interview with ex–Formula 1 driver Michael Schumacher, who suffered a traumatic brain injury in 2013. His family threatened to sue.) What if it’s an inaccurate portrayal or it suffers from what computer scientists call “hallucinations,” when chatbots spit out wildly false things? We’ve already got revenge porn. What if an old enemy constructs a false version of your dead wife out of spite? “There’s an important tension between open access and safety concerns,” Reich says. “Nuclear fusion has enormous upside potential,” too, he adds, but in some cases, open access to the flesh and bones of AI models could be like “inviting people around the world to play with plutonium.”


Yes, there was a Black Mirror episode (Be Right Back) about this issue.  The wiki is here.

Saturday, May 20, 2023

ChatGPT Answers Beat Physicians' on Info, Patient Empathy, Study Finds

Michael DePeau-Wilson
MedPage Today
Originally published 28 April 23

The artificial intelligence (AI) chatbot ChatGPT outperformed physicians when answering patient questions, based on quality of response and empathy, according to a cross-sectional study.

Of 195 exchanges, evaluators preferred ChatGPT responses to physician responses in 78.6% (95% CI 75.0-81.8) of the 585 evaluations, reported John Ayers, PhD, MA, of the Qualcomm Institute at the University of California San Diego in La Jolla, and co-authors.

The AI chatbot responses were given a significantly higher quality rating than physician responses (t=13.3, P<0.001), with the proportion of responses rated as good or very good quality (≥4) higher for ChatGPT (78.5%) than physicians (22.1%), amounting to a 3.6 times higher prevalence of good or very good quality responses for the chatbot, they noted in JAMA Internal Medicine in a new tab or window.

Furthermore, ChatGPT's responses were rated as being significantly more empathetic than physician responses (t=18.9, P<0.001), with the proportion of responses rated as empathetic or very empathetic (≥4) higher for ChatGPT (45.1%) than for physicians (4.6%), amounting to a 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.

"ChatGPT provides a better answer," Ayers told MedPage Today. "I think of our study as a phase zero study, and it clearly shows that ChatGPT wins in a landslide compared to physicians, and I wouldn't say we expected that at all."

He said they were trying to figure out how ChatGPT, developed by OpenAI, could potentially help resolve the burden of answering patient messages for physicians, which he noted is a well-documented contributor to burnout.

Ayers said that he approached this study with his focus on another population as well, pointing out that the burnout crisis might be affecting roughly 1.1 million providers across the U.S., but it is also affecting about 329 million patients who are engaging with overburdened healthcare professionals.

(cut)

"Physicians will need to learn how to integrate these tools into clinical practice, defining clear boundaries between full, supervised, and proscribed autonomy," he added. "And yet, I am cautiously optimistic about a future of improved healthcare system efficiency, better patient outcomes, and reduced burnout."

After seeing the results of this study, Ayers thinks that the research community should be working on randomized controlled trials to study the effects of AI messaging, so that the future development of AI models will be able to account for patient outcomes.

Saturday, December 3, 2022

Public attitudes value interpretability but prioritize accuracy in Artificial Intelligence

Nussberger, A. M., Luo, L., Celis, L. E., 
& Crockett, M. J. (2022). 
Nature communications, 13(1), 5821.

Abstract

As Artificial Intelligence (AI) proliferates across important social institutions, many of the most powerful AI systems available are difficult to interpret for end-users and engineers alike. Here, we sought to characterize public attitudes towards AI interpretability. Across seven studies (N = 2475), we demonstrate robust and positive attitudes towards interpretable AI among non-experts that generalize across a variety of real-world applications and follow predictable patterns. Participants value interpretability positively across different levels of AI autonomy and accuracy, and rate interpretability as more important for AI decisions involving high stakes and scarce resources. Crucially, when AI interpretability trades off against AI accuracy, participants prioritize accuracy over interpretability under the same conditions driving positive attitudes towards interpretability in the first place: amidst high stakes and scarce resources. These attitudes could drive a proliferation of AI systems making high-impact ethical decisions that are difficult to explain and understand.


Discussion

In recent years, academics, policymakers, and developers have debated whether interpretability is a fundamental prerequisite for trust in AI systems. However, it remains unknown whether non-experts–who may ultimately comprise a significant portion of end-users for AI applications–actually care about AI interpretability, and if so, under what conditions. Here, we characterise public attitudes towards interpretability in AI across seven studies. Our data demonstrates that people consider interpretability in AI to be important. Even though these positive attitudes generalise across a host of AI applications and show systematic patterns of variation, they also seem to be capricious. While people valued interpretability as similarly important for AI systems that directly implemented decisions and AI systems recommending a course of action to a human (Study 1A), they valued interpretability more for applications involving higher (relative to lower) stakes and for applications determining access to scarce (relative to abundant) resources (Studies 1A-C, Study 2). And while participants valued AI interpretability across all levels of AI accuracy when considering the two attributes independently (Study 3A), they sacrificed interpretability for accuracy when these two attributes traded off against one another (Studies 3B–C). Furthermore, participants favoured accuracy over interpretability under the same conditions that drove importance ratings of interpretability in the first place: when stakes are high and resources are scarce.

Our findings highlight that high-stakes applications, such as medical diagnosis, will generally be met with enhanced requirements towards AI interpretability. Notably, this sensitivity to stakes parallels magnitude-sensitivity as a foundational process in the cognitive appraisal of outcomes. The impact of stakes on attitudes towards interpretability were apparent not only in our experiments that manipulated stakes within a given AI-application, but also in absolute and relative levels of participants’ valuation of interpretability across applications–take, for instance, ‘hurricane first aid’ and ‘vaccine allocation’ outperforming ‘hiring decisions’, ‘insurance pricing’, and ‘standby seat prioritizing’. Conceivably, this ordering would also emerge if we ranked the applications according to the scope of auditing- and control-measures imposed on human executives, reflecting interpretability’s essential capacity of verifying appropriate and fair decision processes.

Tuesday, March 29, 2022

Gene editing gets safer thanks to redesigned Cas9 protein

Science Daily
Originally posted 2 MAR 22

Summary:

Scientists have redesigned a key component of a widely used CRISPR-based gene-editing tool, called Cas9, to be thousands of times less likely to target the wrong stretch of DNA while remaining just as efficient as the original version, making it potentially much safer.

-----------------

Scientists have redesigned a key component of a widely used CRISPR-based gene-editing tool, called Cas9, to be thousands of times less likely to target the wrong stretch of DNA while remaining just as efficient as the original version, making it potentially much safer.

One of the grand challenges with using CRISPR-based gene editing on humans is that the molecular machinery sometimes makes changes to the wrong section of a host's genome, creating the possibility that an attempt to repair a genetic mutation in one spot in the genome could accidentally create a dangerous new mutation in another.

But now, scientists at The University of Texas at Austin have redesigned a key component of a widely used CRISPR-based gene-editing tool, called Cas9, to be thousands of times less likely to target the wrong stretch of DNA while remaining just as efficient as the original version, making it potentially much safer. The work is described in a paper published today in the journal Nature.

"This really could be a game changer in terms of a wider application of the CRISPR Cas systems in gene editing," said Kenneth Johnson, a professor of molecular biosciences and co-senior author of the study with David Taylor, an assistant professor of molecular biosciences. The paper's co-first authors are postdoctoral fellows Jack Bravo and Mu-Sen Liu.


Journal Reference:

Jack P. K. Bravo, Mu-Sen Liu, et al. Structural basis for mismatch surveillance by CRISPR–Cas9. Nature, 2022; DOI: 10.1038/s41586-022-04470-1

Wednesday, October 6, 2021

Immoral actors’ meta-perceptions are accurate but overly positive

Lees, J. M., Young, L., & Waytz, A.
(2021, August 16).
https://doi.org/10.31234/osf.io/j24tn

Abstract

We examine how actors think others perceive their immoral behavior (moral meta-perception) across a diverse set of real-world moral violations. Utilizing a novel methodology, we solicit written instances of actors’ immoral behavior (N_total=135), measure motives and meta-perceptions, then provide these accounts to separate samples of third-party observers (N_total=933), using US convenience and representative samples (N_actor-observer pairs=4,615). We find that immoral actors can accurately predict how they are perceived, how they are uniquely perceived relative to the average immoral actor, and how they are misperceived. Actors who are better at judging the motives of other immoral actors also have more accurate meta-perceptions. Yet accuracy is accompanied by two distinct biases: overestimating the positive perceptions others’ hold, and believing one’s motives are more clearly perceived than they are. These results contribute to a detailed account of the multiple components underlying both accuracy and bias in moral meta-perception.

From the General Discussion

These results collectively suggest that individuals who have engaged in immoral behavior can accurately forecast how others will react to their moral violations.  

Studies 1-4 also found similar evidence for accuracy in observers’ judgments of the unique motives of immoral actors, suggesting that individuals are able to successfully perspective-take with those who have committed moral violations. Observers higher in cognitive ability (Studies 2-3) and empathic concern (Studies 2-4) were consistently more accurate in these judgments, while observers higher in Machiavellianism (Studies 2-4) and the propensity to engage in unethical workplace behaviors (Studies 3-4) were consistently less accurate. This latter result suggests that more frequently engaging in immoral behavior does not grant one insight into the moral minds of others, and in fact is associated with less ability to understand the motives behind others’ immoral behavior.

Despite strong evidence for meta-accuracy (and observer accuracy) across studies, actors’ accuracy in judging how they would be perceived was accompanied by two judgment biases.  Studies 1-4 found evidence for a transparency bias among immoral actors (Gilovich et al., 1998), meaning that actors overestimated how accurately observers would perceive their self-reported moral motives. Similarly, in Study 4 an examination of actors’ meta-perception point estimates found evidence for a positivity bias. Actors systematically overestimate the positive attributions, and underestimate the negative attributions, made of them and their motives. In fact, the single meta-perception found to be the most inaccurate in its average point estimate was the meta-perception of harm caused, which was significantly underestimated.

Friday, October 23, 2020

Ethical Dimensions of Using Artificial Intelligence in Health Care

Michael J. Rigby
AMA Journal of Ethics
February 2019

An artificially intelligent computer program can now diagnose skin cancer more accurately than a board-certified dermatologist. Better yet, the program can do it faster and more efficiently, requiring a training data set rather than a decade of expensive and labor-intensive medical education. While it might appear that it is only a matter of time before physicians are rendered obsolete by this type of technology, a closer look at the role this technology can play in the delivery of health care is warranted to appreciate its current strengths, limitations, and ethical complexities.

Artificial intelligence (AI), which includes the fields of machine learning, natural language processing, and robotics, can be applied to almost any field in medicine, and its potential contributions to biomedical research, medical education, and delivery of health care seem limitless. With its robust ability to integrate and learn from large sets of clinical data, AI can serve roles in diagnosis, clinical decision making, and personalized medicine. For example, AI-based diagnostic algorithms applied to mammograms are assisting in the detection of breast cancer, serving as a “second opinion” for radiologists. In addition, advanced virtual human avatars are capable of engaging in meaningful conversations, which has implications for the diagnosis and treatment of psychiatric disease. AI applications also extend into the physical realm with robotic prostheses, physical task support systems, and mobile manipulators assisting in the delivery of telemedicine.

Nonetheless, this powerful technology creates a novel set of ethical challenges that must be identified and mitigated since AI technology has tremendous capability to threaten patient preference, safety, and privacy. However, current policy and ethical guidelines for AI technology are lagging behind the progress AI has made in the health care field. While some efforts to engage in these ethical conversations have emerged, the medical community remains ill informed of the ethical complexities that budding AI technology can introduce. Accordingly, a rich discussion awaits that would greatly benefit from physician input, as physicians will likely be interfacing with AI in their daily practice in the near future.

Monday, May 25, 2020

How Could the CDC Make That Mistake?

Alexis C. Madrigal & Robinson Meyer
The Atlantic
Originally posted 21 May 20

The Centers for Disease Control and Prevention is conflating the results of two different types of coronavirus tests, distorting several important metrics and providing the country with an inaccurate picture of the state of the pandemic. We’ve learned that the CDC is making, at best, a debilitating mistake: combining test results that diagnose current coronavirus infections with test results that measure whether someone has ever had the virus. The upshot is that the government’s disease-fighting agency is overstating the country’s ability to test people who are sick with COVID-19. The agency confirmed to The Atlantic on Wednesday that it is mixing the results of viral and antibody tests, even though the two tests reveal different information and are used for different reasons.

This is not merely a technical error. States have set quantitative guidelines for reopening their economies based on these flawed data points.

Several states—including Pennsylvania, the site of one of the country’s largest outbreaks, as well as Texas, Georgia, and Vermont—are blending the data in the same way. Virginia likewise mixed viral and antibody test results until last week, but it reversed course and the governor apologized for the practice after it was covered by the Richmond Times-Dispatch and The Atlantic. Maine similarly separated its data on Wednesday; Vermont authorities claimed they didn’t even know they were doing this.

The widespread use of the practice means that it remains difficult to know exactly how much the country’s ability to test people who are actively sick with COVID-19 has improved.

The info is here.

Sunday, April 19, 2020

On the ethics of algorithmic decision-making in healthcare

Grote T, Berens P
Journal of Medical Ethics 
2020;46:205-211.

Abstract

In recent years, a plethora of high-profile scientific publications has been reporting about machine learning algorithms outperforming clinicians in medical diagnosis or treatment recommendations. This has spiked interest in deploying relevant algorithms with the aim of enhancing decision-making in healthcare. In this paper, we argue that instead of straightforwardly enhancing the decision-making capabilities of clinicians and healthcare institutions, deploying machines learning algorithms entails trade-offs at the epistemic and the normative level. Whereas involving machine learning might improve the accuracy of medical diagnosis, it comes at the expense of opacity when trying to assess the reliability of given diagnosis. Drawing on literature in social epistemology and moral responsibility, we argue that the uncertainty in question potentially undermines the epistemic authority of clinicians. Furthermore, we elucidate potential pitfalls of involving machine learning in healthcare with respect to paternalism, moral responsibility and fairness. At last, we discuss how the deployment of machine learning algorithms might shift the evidentiary norms of medical diagnosis. In this regard, we hope to lay the grounds for further ethical reflection of the opportunities and pitfalls of machine learning for enhancing decision-making in healthcare.

From the Conclusion

In this paper, we aimed at examining which opportunities and pitfalls machine learning potentially provides to enhance of medical decision-making on epistemic and ethical grounds. As should have become clear, enhancing medical decision-making by deferring to machine learning algorithms requires trade-offs at different levels. Clinicians, or their respective healthcare institutions, are facing a dilemma: while there is plenty of evidence of machine learning algorithms outsmarting their human counterparts, their deployment comes at the costs of high degrees of uncertainty. On epistemic grounds, relevant uncertainty promotes risk-averse decision-making among clinicians, which then might lead to impoverished medical diagnosis. From an ethical perspective, deferring to machine learning algorithms blurs the attribution of accountability and imposes health risks to patients. Furthermore, the deployment of machine learning might also foster a shift of norms within healthcare. It needs to be pointed out, however, that none of the issues we discussed presents a knockout argument against deploying machine learning in medicine, and our article is not intended this way at all. On the contrary, we are convinced that machine learning provides plenty of opportunities to enhance decision-making in medicine.

The article is here.

Friday, March 20, 2020

Flawed science? Two efforts launched to improve scientific validity of psychological test evidence in court

Karen Franklin
forensicpsychologist Blog
Originally posted 15 Feb 20

Here is an excerpt:

New report slams "junk science” psychological assessments

In one of two significant developments, a group of researchers today released evidence of systematic problems with the state of psychological test admissibility in court. The researchers' comprehensive survey found that only about two-thirds of the tools used by clinicians in forensic settings were generally accepted in the field, while even fewer -- only about four in ten -- were favorably reviewed in authoritative sources such as the Mental Measurements Yearbook.

Despite this, psychological tests are rarely challenged when they are introduced in court, Tess M.S. Neal and her colleagues found. Even when they are, the challenges fail about two-thirds of the time. Worse yet, there is little relationship between a tool’s psychometric quality and the likelihood of it being challenged.

“Some of the weakest tools tend to get a pass from the courts,” write the authors of the newly issued report, "Psychological Assessments in Legal Contexts: Are Courts Keeping 'Junk Science' Out of the Courtroom?”

The report, currently in press in the journal Psychological Science in the Public Interest, proposes that standard batteries be developed for forensic use, based on the consensus of experts in the field as to which tests are the most reliable and valid for assessing a given psycho-legal issue. It further cautions against forensic deployment of newly developed tests that are being marketed by for-profit corporations before adequate research or review by independent professionals.

The info is here.

Monday, March 2, 2020

The Dunning-Kruger effect, or why the ignorant think they’re experts

Alexandru Micu
zmescience.com
Originally posted 13 Feb 20

Here is an excerpt:

It’s not specific only to technical skills but plagues all walks of human existence equally. One study found that 80% of drivers rate themselves as above average, which is literally impossible because that’s not how averages work. We tend to gauge our own relative popularity the same way.

It isn’t limited to people with low or nonexistent skills in a certain matter, either — it works on pretty much all of us. In their first study, Dunning and Kruger also found that students who scored in the top quartile (25%) routinely underestimated their own competence.

A fuller definition of the Dunning-Kruger effect would be that it represents a bias in estimating our own ability that stems from our limited perspective. When we have a poor or nonexistent grasp on a topic, we literally know too little of it to understand how little we know. Those who do possess the knowledge or skills, however, have a much better idea of where they sit. But they also think that if a task is clear and simple to them, it must be so for everyone else as well.

A person in the first group and one in the second group are equally liable to use their own experience and background as the baseline and kinda just take it for granted that everyone is near that baseline. They both partake in the “illusion of confidence” — for one, that confidence is in themselves, for the other, in everyone else.

The info is here.

Friday, February 7, 2020

People Who Second-Guess Themselves Make Worse Decisions

Christopher Ingraham
The Washington Post
Originally posted 9 Jan 20

Here is an excerpt:

The researchers specifically wanted to know whether the revisions were more accurate than the originals.

In theory, there are a lot of reasons to believe this might be the case. A person would presumably revise a prediction after obtaining new information, such as an analyst’s match forecast or a team roster change.

In practice, however, the opposite was true: Revised forecasts accurately predicted the final match score 7.7 percent of the time. But the unaltered forecasts were correct 9.3 percent of the time.

In other words, revised forecasts were about 17 percent less accurate than those that had never changed.

(cut)

So where did the second-guessers go wrong? For starters, the researchers controlled for match-to-match and player-to-player variation — it isn’t likely the case, in other words, that matches receiving more revisions were more difficult to predict, or that bad guessers were more likely to revise their forecasts.

The researchers found that revisions were more likely to go awry when forecasters dialed up the scores — by going, say, from predicting a 2-1 final score to 3-2. Indeed, across the data set, the bettors systematically underestimated the likelihood of a 0-0 draw: an outcome anticipated 1.5 percent of the time that actually occurs in 8.4 percent of matches.

The info is here.

Tuesday, January 7, 2020

AI Is Not Similar To Human Intelligence. Thinking So Could Be Dangerous

Elizabeth Fernandez
Artificial intelligenceforbes.com
Originally posted 30 Nov 19

Here is an excerpt:

No doubt, these algorithms are powerful, but to think that they “think” and “learn” in the same way as humans would be incorrect, Watson says. There are many differences, and he outlines three.

The first - DNNs are easy to fool. For example, imagine you have a picture of a banana. A neural network successfully classifies it as a banana. But it’s possible to create a generative adversarial network that can fool your DNN. By adding a slight amount of noise or another image besides the banana, your DNN might now think the picture of a banana is a toaster. A human could not be fooled by such a trick. Some argue that this is because DNNs can see things humans can’t, but Watson says, “This disconnect between biological and artificial neural networks suggests that the latter lack some crucial component essential to navigating the real world.”

Secondly, DNNs need an enormous amount of data to learn. An image classification DNN might need to “see” thousands of pictures of zebras to identify a zebra in an image. Give the same test to a toddler, and chances are s/he could identify a zebra, even one that’s partially obscured, by only seeing a picture of a zebra a few times. Humans are great “one-shot learners,” says Watson. Teaching a neural network, on the other hand, might be very difficult, especially in instances where data is hard to come by.

Thirdly, neural nets are “myopic”. They can see the trees, so to speak, but not the forest. For example, a DNN could successfully label a picture of Kim Kardashian as a woman, an entertainer, and a starlet. However, switching the position of her mouth and one of her eyes actually improved the confidence of the DNN’s prediction. The DNN didn’t see anything wrong with that image. Obviously, something is wrong here. Another example - a human can say “that cloud looks like a dog”, whereas a DNN would say that the cloud is a dog.

The info is here.

Saturday, April 20, 2019

Cardiologist Eric Topol on How AI Can Bring Humanity Back to Medicine

Alice Park
Time.com
Originally published March 14, 2019

Here is an excerpt:

What are the best examples of how AI can work in medicine?

We’re seeing rapid uptake of algorithms that make radiologists more accurate. The other group already deriving benefit is ophthalmologists. Diabetic retinopathy, which is a terribly underdiagnosed cause of blindness and a complication of diabetes, is now diagnosed by a machine with an algorithm that is approved by the Food and Drug Administration. And we’re seeing it hit at the consumer level with a smart-watch app with a deep learning algorithm to detect atrial fibrillation.

Is that really artificial intelligence, in the sense that the machine has learned about medicine like doctors?

Artificial intelligence is different from human intelligence. It’s really about using machines with software and algorithms to ingest data and come up with the answer, whether that data is what someone says in speech, or reading patterns and classifying or triaging things.

What worries you the most about AI in medicine?

I have lots of worries. First, there’s the issue of privacy and security of the data. And I’m worried about whether the AI algorithms are always proved out with real patients. Finally, I’m worried about how AI might worsen some inequities. Algorithms are not biased, but the data we put into those algorithms, because they are chosen by humans, often are. But I don’t think these are insoluble problems.

The info is here.

Monday, April 15, 2019

Death by a Thousand Clicks: Where Electronic Health Records Went Wrong

Erika Fry and Fred Schulte
Fortune.com
Originally posted on March 18, 2019

Here is an excerpt:

Damning evidence came from a whistleblower claim filed in 2011 against the company. Brendan Delaney, a British cop turned EHR expert, was hired in 2010 by New York City to work on the eCW implementation at Rikers Island, a jail complex that then had more than 100,000 inmates. But soon after he was hired, Delaney noticed scores of troubling problems with the system, which became the basis for his lawsuit. The patient medication lists weren’t reliable; prescribed drugs would not show up, while discontinued drugs would appear as current, according to the complaint. The EHR would sometimes display one patient’s medication profile accompanied by the physician’s note for a different patient, making it easy to misdiagnose or prescribe a drug to the wrong individual. Prescriptions, some 30,000 of them in 2010, lacked proper start and stop dates, introducing the opportunity for under- or overmedication. The eCW system did not reliably track lab results, concluded Delaney, who tallied 1,884 tests for which they had never gotten outcomes.

(cut)

Electronic health records were supposed to do a lot: make medicine safer, bring higher-quality care, empower patients, and yes, even save money. Boosters heralded an age when researchers could harness the big data within to reveal the most effective treatments for disease and sharply reduce medical errors. Patients, in turn, would have truly portable health records, being able to share their medical histories in a flash with doctors and hospitals anywhere in the country—essential when life-and-death decisions are being made in the ER.

But 10 years after President Barack Obama signed a law to accelerate the digitization of medical records—with the federal government, so far, sinking $36 billion into the effort—America has little to show for its investment.

The info is here.

Saturday, March 9, 2019

Can AI Help Reduce Disparities in General Medical and Mental Health Care?

Irene Y. Chen, Peter Szolovits, and Marzyeh Ghassemi
AMA J Ethics. 2019;21(2):E167-179.
doi: 10.1001/amajethics.2019.167.

Abstract

Background: As machine learning becomes increasingly common in health care applications, concerns have been raised about bias in these systems’ data, algorithms, and recommendations. Simply put, as health care improves for some, it might not improve for all.

Methods: Two case studies are examined using a machine learning algorithm on unstructured clinical and psychiatric notes to predict intensive care unit (ICU) mortality and 30-day psychiatric readmission with respect to race, gender, and insurance payer type as a proxy for socioeconomic status.

Results: Clinical note topics and psychiatric note topics were heterogenous with respect to race, gender, and insurance payer type, which reflects known clinical findings. Differences in prediction accuracy and therefore machine bias are shown with respect to gender and insurance type for ICU mortality and with respect to insurance policy for psychiatric 30-day readmission.

Conclusions: This analysis can provide a framework for assessing and identifying disparate impacts of artificial intelligence in health care.

Sunday, February 24, 2019

Biased algorithms: here’s a more radical approach to creating fairness

Tom Douglas
theconversation.com
Originally posted January 21, 2019

Here is an excerpt:

What’s fair?

AI researchers concerned about fairness have, for the most part, been focused on developing algorithms that are procedurally fair – fair by virtue of the features of the algorithms themselves, not the effects of their deployment. But what if it’s substantive fairness that really matters?

There is usually a tension between procedural fairness and accuracy – attempts to achieve the most commonly advocated forms of procedural fairness increase the algorithm’s overall error rate. Take the COMPAS algorithm for example. If we equalised the false positive rates between black and white people by ignoring the predictors of recidivism that tended to be disproportionately possessed by black people, the likely result would be a loss in overall accuracy, with more people wrongly predicted to re-offend, or not re-offend.

We could avoid these difficulties if we focused on substantive rather than procedural fairness and simply designed algorithms to maximise accuracy, while simultaneously blocking or compensating for any substantively unfair effects that these algorithms might have. For example, instead of trying to ensure that crime prediction errors affect different racial groups equally – a goal that may in any case be unattainable – we could instead ensure that these algorithms are not used in ways that disadvantage those at high risk. We could offer people deemed “high risk” rehabilitative treatments rather than, say, subjecting them to further incarceration.

The info is here.

Monday, December 24, 2018

Your Intuition Is Wrong, Unless These 3 Conditions Are Met

Emily Zulz
www.thinkadvisor.com
Originally posted November 16, 2018

Here is an excerpt:

“Intuitions of master chess players when they look at the board [and make a move], they’re accurate,” he said. “Everybody who’s been married could guess their wife’s or their husband’s mood by one word on the telephone. That’s an intuition and it’s generally very good, and very accurate.”

According to Kahneman, who’s studied when one can trust intuition and when one cannot, there are three conditions that need to be met in order to trust one’s intuition.

The first is that there has to be some regularity in the world that someone can pick up and learn.

“So, chess players certainly have it. Married people certainly have it,” Kahnemen explained.

However, he added, people who pick stocks in the stock market do not have it.

“Because, the stock market is not sufficiently regular to support developing that kind of expert intuition,” he explained.

The second condition for accurate intuition is “a lot of practice,” according to Kahneman.

And the third condition is immediate feedback. Kahneman said that “you have to know almost immediately whether you got it right or got it wrong.”

The info is here.