Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy
Showing posts with label validity. Show all posts
Showing posts with label validity. Show all posts

Saturday, September 16, 2023

A Metacognitive Blindspot in Intellectual Humility Measures

Costello, T. H., Newton, C., Lin, H., & Pennycook, G.
(2023, August 6).


Intellectual humility (IH) is commonly defined as recognizing the limits of one’s knowledge and abilities. However, most research has relied entirely on self-report measures of IH, without testing whether these instruments capture the metacognitive core of the construct. Across two studies (Ns = 898; 914), using generalized additive mixed models to detect complex non-linear interactions, we evaluated the correspondence between widely used IH self-reports and performance on calibration and resolution paradigms designed to model the awareness of one’s mental capabilities (and their fallibility). On an overconfidence paradigm (N observations per model = 2,692-2,742), none of five IH measures attenuated the Dunning-Kruger effect, whereby poor performers overestimate their abilities and high performers underestimate them. On a confidence-accuracy paradigm (Nobservation per model = 7,223 - 12,706), most IH measures were associated with inflated confidence regardless of accuracy, or were specifically related to confidence when participants were correct but not when they were incorrect. The sole exception was the “Lack of Intellectual Overconfidence” subscale of the Comprehensive Intellectual Humility Scale, which uniquely predicted lower confidence for incorrect responses. Meanwhile, measures of Actively Open-minded Thinking reliably predicted calibration and resolution. These findings reveal substantial discrepancies between IH self-reports and metacognitive abilities, suggesting most IH measures lack validity. It may not be feasible to assess IH via self-report–as indicating a great deal of humility may, itself, be a sign of a failure in humility.


IH represents the ability to identify the constraints of one’s psychological, epistemic, and cultural perspective— to conduct lay phenomenology, acknowledging that the default human perspective is (literally) self-centered (Wallace, 2009) — and thereby cultivate an awareness of the limits of a single person, theory, or ideology to describe the vast and searingly complex universe. It is a process that presumably involves effortful and vigilant noticing – tallying one’s epistemic track record, and especially one’s fallibility (Ballantyne, 2021).

IH, therefore, manifests dynamically in individuals as a boundary between one’s informational environment and one’s model of reality. This portrait of IH-as-boundary appears repeatedly in philosophical and psychological treatments of IH, which frequently frame awareness of (epistemic) limitations as IH’s conceptual, metacognitive core (Leary et al., 2017; Porter, Elnakouri, et al., 2022). Yet as with a limit in mathematics, epistemic limits are appropriately defined as functions: their value is dependent on inputs (e.g., information environment, access to knowledge) that vary across contexts and individuals. Particularly, measuring IH requires identifying at least two quantities— one’s epistemic capabilities and one’s appraisal of said capabilities— from which a third, IH-qua-metacognition, can be derived as the distance between the two quantities.

Contemporary IH self-reports tend not to account for either parameter, seeming to rest instead on an auxiliary assumption: That people who are attuned to, and “own”, their epistemic limitations will generate characteristic, intellectually humble patterns of thinking and behavior. IH questionnaires then target these patterns, rather than the shared propensity for IH which the patterns ostensibly reflect.

We sought to both test and circumvent this assumption (and mono-method measurement limitation) in the present research. We did so by defining IH’s metacognitive core, functionally and statistically, in terms of calibration and resolution. We operationalized calibration as the convergence between participants’ performance on a series of epistemic tasks, on the one hand, and participants’ estimation of their own performance, on the other. Given that the relation between self-estimation and actual performance is non-linear (i.e., the Dunning-Kruger effect), there were several pathways by which IH might predict calibration: (1) decreased overestimation among low performers, (2) decreased underestimation among high performers, or (3) unilateral weakening of miscalibration among both low and high performers (for a visual representation, refer to Figure 1). Further, we operationalized epistemic resolution by assessing the relation between IH, on the one hand, individuals’ item-by-item confidence judgments for correct versus incorrect answers, on the other hand. Thus, resolution represents the capacity to distinguish between one’s correct and incorrect judgments and beliefs (a seemingly necessary prerequisite for building an accurate and calibrated model of one’s knowledge).

Sunday, July 26, 2020

The trolley problem problem

James Wilson
Originally posted 20 May 20

Here is an excerpt:

Some philosophers think that ethical thought experiments either are, or have a strong affinity with, scientific experiments. On such a view, thought experiments, like other experiments, when well-designed can allow knowledge to be built via rigorous and unbiased testing of hypotheses. Just as in the randomised controlled trials in which new pharmaceuticals are tested, the circumstances and the types of control in thought experiments could be such as to make the situation very unlike everyday situations, but that is a virtue rather than a vice, insofar as it allows ethical hypotheses to be tested cleanly and rigorously.

If thought experiments are – literally – experiments, this helps to explain how they might provide insights into the way the world is. But it would also mean that thought experiments would inherit the two methodological challenges that attend to experiments more generally, known as internal and external validity. Internal validity relates to the extent to which an experiment succeeds in providing an unbiased test of the variable or hypothesis in question. External validity relates to the extent to which the results in the controlled environment translate to other contexts, and in particular to our own. External validity is a major challenge, as the very features that make an environment controlled and suitable to obtain internal validity often make it problematically different from the uncontrolled environments in which interventions need to be applied.

There are significant challenges with both the internal and the external validity of thought experiments. It is useful to compare the kind of care with which medical researchers or psychologists design experiments – including validation of questionnaires, double-blinding of trials, placebo control, power calculations to determine the cohort size required and so on – with the typically rather more casual approach taken by philosophers. Until recently, there has been little systematic attempt within normative ethics to test variations of different phrasing of thought experiments, or to think about framing effects, or sample sizes; or the extent to which the results from the thought experiment are supposed to be universal or could be affected by variables such as gender, class or culture. A central ambiguity has been whether the implied readers of ethical thought experiments should be just anyone, or other philosophers; and, as a corollary, whether judgments elicited are supposed to be expert judgments, or the judgments of ordinary human beings. As the vast majority of ethical thought experiments in fact remain confined to academic journals, and are tested only informally on other philosophers, de facto they are tested only on those with expertise in the construction of ethical theories, rather than more generally representative samples or those with expertise in the contexts that the thought experiments purport to describe.

The info is here.

Friday, March 20, 2020

Flawed science? Two efforts launched to improve scientific validity of psychological test evidence in court

Karen Franklin
forensicpsychologist Blog
Originally posted 15 Feb 20

Here is an excerpt:

New report slams "junk science” psychological assessments

In one of two significant developments, a group of researchers today released evidence of systematic problems with the state of psychological test admissibility in court. The researchers' comprehensive survey found that only about two-thirds of the tools used by clinicians in forensic settings were generally accepted in the field, while even fewer -- only about four in ten -- were favorably reviewed in authoritative sources such as the Mental Measurements Yearbook.

Despite this, psychological tests are rarely challenged when they are introduced in court, Tess M.S. Neal and her colleagues found. Even when they are, the challenges fail about two-thirds of the time. Worse yet, there is little relationship between a tool’s psychometric quality and the likelihood of it being challenged.

“Some of the weakest tools tend to get a pass from the courts,” write the authors of the newly issued report, "Psychological Assessments in Legal Contexts: Are Courts Keeping 'Junk Science' Out of the Courtroom?”

The report, currently in press in the journal Psychological Science in the Public Interest, proposes that standard batteries be developed for forensic use, based on the consensus of experts in the field as to which tests are the most reliable and valid for assessing a given psycho-legal issue. It further cautions against forensic deployment of newly developed tests that are being marketed by for-profit corporations before adequate research or review by independent professionals.

The info is here.

Friday, October 12, 2018

The New Standardized Morality Test. Really.

Peter Greene
Forbes - Education
Originally published September 13, 2018

Here is an excerpt:

Morality is sticky and complicated, and I'm not going to pin it down here. It's one thing to manage your own moral growth and another thing to foster the moral development of family and friends and still quite another thing to have a company hired by a government draft up morality curriculum that will be delivered by yet another wing of the government. And it is yet another other thing to create a standardized test by which to give students morality scores.

But the folks at ACT say they will "leverage the expertise of U.S.-based research and test development teams to create the assessment, which will utilize the latest theory and principles of social and emotional learning (SEL) through the development process." That is quite a pile of jargon to dress up "We're going to cobble together a test to measure how moral a student is. The test will be based on stuff."

ACT Chief Commercial Officer Suzana Delanghe is quoted saying "We are thrilled to be supporting a holistic approach to student success" and promises that they will create a "world class assessment that measures UAE student readiness" because even an ACT manager knows better than to say that they're going to write a standardized test for morality.

The info is here.

Tuesday, March 14, 2017

“I placed too much faith in underpowered studies:” Nobel Prize winner admits mistakes

Retraction Watch
Originally posted February 21, 2017

Although it’s the right thing to do, it’s never easy to admit error — particularly when you’re an extremely high-profile scientist whose work is being dissected publicly. So while it’s not a retraction, we thought this was worth noting: A Nobel Prize-winning researcher has admitted on a blog that he relied on weak studies in a chapter of his bestselling book.

The blog — by Ulrich Schimmack, Moritz Heene, and Kamini Kesavan — critiqued the citations included in a book by Daniel Kahneman, a psychologist whose research has illuminated our understanding of how humans form judgments and make decisions and earned him half of the 2002 Nobel Prize in Economics.

The article is here.

Wednesday, February 22, 2017

Moralized Rationality: Relying on Logic and Evidence in the Formation and Evaluation of Belief Can Be Seen as a Moral Issue

Ståhl T, Zaal MP, Skitka LJ (2016)
PLoS ONE 11(11): e0166332. doi:10.1371/journal.pone.0166332


In the present article we demonstrate stable individual differences in the extent to which a reliance on logic and evidence in the formation and evaluation of beliefs is perceived as a moral virtue, and a reliance on less rational processes is perceived as a vice. We refer to this individual difference variable as moralized rationality. Eight studies are reported in which an instrument to measure individual differences in moralized rationality is validated. Results show that the Moralized Rationality Scale (MRS) is internally consistent, and captures something distinct from the personal importance people attach to being rational (Studies 1–3). Furthermore, the MRS has high test-retest reliability (Study 4), is conceptually distinct from frequently used measures of individual differences in moral values, and it is negatively related to common beliefs that are not supported by scientific evidence (Study 5). We further demonstrate that the MRS predicts morally laden reactions, such as a desire for punishment, of people who rely on irrational (vs. rational) ways of forming and evaluating beliefs (Studies 6 and 7). Finally, we show that the MRS uniquely predicts motivation to contribute to a charity that works to prevent the spread of irrational beliefs (Study 8). We conclude that (1) there are stable individual differences in the extent to which people moralize a reliance on rationality in the formation and evaluation of beliefs, (2) that these individual differences do not reduce to the personal importance attached to rationality, and (3) that individual differences in moralized rationality have important motivational and interpersonal consequences.

The article is here.

Tuesday, November 15, 2016

The Inevitable Evolution of Bad Science

Ed Yong
The Atlantic
Originally published September 21, 2016

Here is an excerpt:

In the model, as in real academia, positive results are easier to publish than negative one, and labs that publish more get more prestige, funding, and students. They also pass their practices on. With every generation, one of the oldest labs dies off, while one of the most productive one reproduces, creating an offspring that mimics the research style of the parent. That’s the equivalent of a student from a successful team starting a lab of their own.

Over time, and across many simulations, the virtual labs inexorably slid towards less effort, poorer methods, and almost entirely unreliable results. And here’s the important thing: Unlike the hypothetical researcher I conjured up earlier, none of these simulated scientists are actively trying to cheat. They used no strategy, and they behaved with integrity. And yet, the community naturally slid towards poorer methods. What the model shows is that a world that rewards scientists for publications above all else—a world not unlike this one—naturally selects for weak science.

“The model may even be optimistic,” says Brian Nosek from the Center of Open Science, because it doesn’t account for our unfortunate tendency to justify and defend the status quo. He notes, for example, that studies in the social and biological sciences are, on average, woefully underpowered—they are too small to find reliable results.

The article is here.

Thursday, June 2, 2016

Scientific consent, data, and doubling down on the internet

Oliver Keyes
Originally published May 12, 2016

Here is an excerpt:

The Data

Yesterday morning I woke up to a Twitter friend pointing me to a release of OKCupid data, by Kirkegaard. Having now spent some time exploring the data, and reading both public statements on the work and the associated paper: this is without a doubt one of the most grossly unprofessional, unethical and reprehensible data releases I have ever seen.

There are two reasons for that. The first is very simple; Kirkegaard never asked anyone. He didn't ask OKCupid, he didn't ask the users covered by the dataset - he simply said 'this is public so people should expect it's going to be released'.

The blog post is here.

Sunday, January 18, 2015

Why the Myers-Briggs test is totally meaningless

By Joseph Stromberg
Published on January 5, 2015

The Myers-Briggs Type Indicator is probably the most widely used personality test in the world.

An estimated 2 million people take it annually, at the behest of corporate HR departments, colleges, and even government agencies. The company that makes and markets the test makes somewhere around $20 million each year.

The only problem? The test is completely meaningless.

"There's just no evidence behind it," says Adam Grant, an organizational psychologist at the University of Pennsylvania who's written about the shortcomings of the Myers-Briggs previously. "The characteristics measured by the test have almost no predictive power on how happy you'll be in a situation, how you'll perform at your job, or how happy you'll be in your marriage."

The entire article is here.

Saturday, December 21, 2013

Ethical Considerations in the Development and Application of Mental and Behavioral Nosologies: Lessons from DSM-5

By Robert M. Gordon and Lisa Cosgrove
Psychological Injury and Law
December 13, 2013


We are not likely to find a diagnostic system as “unethical,” per se, but rather find that it creates ethical concerns in its formulation and application. There is an increased risk of misuse and misunderstanding of the DSM-5 particularly when applied to forensic assessment because of documented problems with reliability and validity. For example, when field tested, the American Psychiatric Association reported diagnostic category kappa levels as acceptable that were far below the standard level of acceptability. The DSM-5 does not offer sensitivity and specificity levels and thus psychologists must keep this in mind when using or teaching this manual. Also, especially in light of concerns about diagnostic inflation, we recommend that psychologists exercise caution when using the DSM-5 in forensic assessments, including civil and criminal cases. Alternatives to the DSM-5, such as the International Classification of Diseases and the Psychodynamic Diagnostic Manual are reviewed.

Here is an excerpt:

It should be emphasized that ethical concerns about DSM-5 panel members having commercial ties is not meant in any way to imply that any task force or work group member intentionally made pro- industry decisions. Decades of research have demonstrated that cognitive biases are commonplace and very difficult to eradicate, and more recent studies suggest that disclosure of financial conflicts of interest may actually worsen bias (Dana & Lowenstein, 2003). This is because bias is most often manifested in subtle ways unbeknownst to the researcher or clinician, and thus is usually implicit and unintentional. Physicians—like everyone else—have ethical blind spots. Social scientists have documented the fact that physicians often fail to recognize their vulnerability to commercial interests because they mistakenly believe that they are immune to marketing and industry influence (Sah & Faugh-Burman, 2013).

The entire article is here.

Friday, July 26, 2013

Recent Findings Force Scientists To Rethink The Rules Of Neuroimaging

Originally published on July 13, 2013

Here is an excerpt:

Brain mapping experiments all share a basic logic. In the simplest type of experiment, researchers compare brain activity while participants perform an experimental task and a control task. The experimental task might involve showing participants a noun, such as the word "cake," and asking them to say aloud a verb that goes with that noun, for instance "eat." The control task might involve asking participants to simply say the word they see aloud.

"The idea here is that the control task involves some of the same cognitive processes as the experimental task, in this case perceptual and articulatory processes," Jack explained. "But there is at least one process that is different - the act of selecting a semantically appropriate word from a different lexical category."

The entire article is here.

The original paper is here.

Tuesday, May 21, 2013

DSM-IV Boss Presses Attack on New Revision

By John Gever, Deputy Managing Editor
MedPage Today
Published: May 17, 2013

A new edition of psychiatry's diagnostic guide "will probably lead to substantial false-positive rates and unnecessary treatment," charged the man who led development of the last version.

To be released this weekend at the American Psychiatric Association's annual meeting, the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders, or DSM-5, "introduce[s] several high-prevalence diagnoses at the fuzzy boundary with normality," according to Allen Frances, MD, who chaired the task force responsible for DSM-IV issued in 1994.

Frances, now an emeritus professor at Duke University, wrote online in Annals of Internal Medicine that changes from DSM-IV will apply disease labels to individuals who may be unhappy or offensive but still normal. Such individuals would include those experiencing "the forgetfulness of old age" as well as children with severe, chronic temper tantrums and individuals with physical symptoms with no medical explanation.

He also worried about new marketing pushes from the pharmaceutical industry seeking to exploit what he believes are "loose" diagnostic criteria in the new edition. "Drug companies take marketing advantage of the loose DSM definitions by promoting the misleading idea that everyday life problems are actually undiagnosed psychiatric illness caused by a chemical imbalance and requiring a solution in pill form," he wrote.

The entire article is here.

Saturday, May 18, 2013

Psychiatry’s Guide Is Out of Touch With Science, Experts Say

The New York Times
Published: May 6, 2013

Just weeks before the long-awaited publication of a new edition of the so-called bible of mental disorders, the federal government’s most prominent psychiatric expert has said the book suffers from a scientific “lack of validity.”

The expert, Dr. Thomas R. Insel, director of the National Institute of Mental Health, said in an interview Monday that his goal was to reshape the direction of psychiatric research to focus on biology, genetics and neuroscience so that scientists can define disorders by their causes, rather than their symptoms.

While the Diagnostic and Statistical Manual of Mental Disorders, or D.S.M., is the best tool now available for clinicians treating patients and should not be tossed out, he said, it does not reflect the complexity of many disorders, and its way of categorizing mental illnesses should not guide research.

“As long as the research community takes the D.S.M. to be a bible, we’ll never make progress,” Dr. Insel said, adding, “People think that everything has to match D.S.M. criteria, but you know what? Biology never read that book.”

The entire story is here.

Tuesday, April 23, 2013

Most brain science papers are neurotrash

By Andrew Orlowski
The Register
Originally published April 12, 2013

A group of academics from Oxford, Stanford, Virginia and Bristol universities have looked at a range of subfields of neuroscience and concluded that most of the results are statistically worthless.

The researchers found that most structural and volumetric MRI studies are very small and have minimal power to detect differences between compared groups (for example, healthy people versus those with mental health diseases). Their paper also stated that, specifically, a clear excess of "significance bias" (too many results deemed statistically significant) has been demonstrated in studies of brain volume abnormalities, and similar problems appear to exist in fMRI studies of the blood-oxygen-level-dependent response.

The team, researchers at Stanford Medical School, Virginia, Bristol and the Human Genetics dept at Oxford, looked at 246 neuroscience articles published in 2011 and and excluded papers where the test data was unavailable. They found that the papers' median statistical power - the possibility that a study will identify an effect when there is an effect there to be found - was just 21 per cent. What that means in practice is that if you were to run one of the experiments five times, you’d only find the effect once.

A further survey of papers drawn from fMRI brain scanners - and studies using such scanners have long filled the popular media with dramatic claims - found that their statistical power was just 8 per cent.

The entire story is here.

Thanks to Tom Fink for this story.

Wednesday, April 11, 2012

Can Most Cancer Research Be Trusted?

Addressing the problem of "academic risk" in biomedical research

By Ronald Bailey
Originally published April 3, 2012

When a cancer study is published in a prestigious peer-reviewed journal, the implcation is the findings are robust, replicable, and point the way toward eventual treatments. Consequently, researchers scour their colleagues' work for clues about promising avenues to explore. Doctors pore over the pages, dreaming of new therapies coming down the pike. Which makes a new finding that nine out of 10 preclinical peer-reviewed cancer research studies cannot be replicated all the more shocking and discouraging.

Last week, the scientific journal Nature published a disturbing commentary claiming that in the area of preclinical research—which involves experiments done on rodents or cells in petri dishes with the goal of identifying possible targets for new treatments in people—independent researchers doing the same experiment cannot get the same result as reported in the scientific literature.

The entire commentary is here.

Thanks to Rich Ievoli for the story.  He could have been a contender.