Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Saturday, December 24, 2022

How Stable are Moral Judgments?

Rehren, P., Sinnott-Armstrong, W.
Rev. Phil.Psych. (2022).


Psychologists and philosophers often work hand in hand to investigate many aspects of moral cognition. In this paper, we want to highlight one aspect that to date has been relatively neglected: the stability of moral judgment over time. After explaining why philosophers and psychologists should consider stability and then surveying previous research, we will present the results of an original three-wave longitudinal study. We asked participants to make judgments about the same acts in a series of sacrificial dilemmas three times, 6–8 days apart. In addition to investigating the stability of our participants’ ratings over time, we also explored some potential explanations for instability. To end, we will discuss these and other potential psychological sources of moral stability (or instability) and highlight possible philosophical implications of our findings.

From the General Discussion

We have argued that the stability of moral judgments over time is an important feature of moral cognition for philosophers and psychologists to consider. Next, we presented an original empirical study into the stability over 6–8 days of moral judgments about acts in sacrificial dilemmas. Like Helzer et al. (2017, Study 1), we found an overall test-retest correlation of 0.66. Moreover, we observed moderate to large proportions of rating shifts, and small to moderate proportions of rating revisions (M = 14%), rejections (M = 5%) and adoptions (M = 6%)—that is, the participants in question judged p in one wave, but did not judge p in the other wave.

What Explains Instability?

One potential explanation of our results is that they are not a genuine feature of moral judgments about sacrificial dilemmas, but instead are due to measurement error. Measurement error is the difference between the observed and the true value of a variable. So, it may be that most of the rating changes we observed do not mean that many real-life moral judgments about acts in sacrificial dilemmas are (or would be) unstable over short periods of time. Instead, it may be that when people make moral judgments about sacrificial dilemmas in real life, their judgments remain very stable from one week to the next, but our study (perhaps any study) was not able to capture this stability.

To the extent that real-life moral judgment is what moral psychologists and philosophers are interested in, this may suggest a problem with the type of study design used in this and many other papers. If there is enough measurement error, then it may be very difficult to draw firm conclusions about real-life moral judgments from this research. Other researchers have raised related objections. Most forcefully, Bauman et al. (2014) have argued that participants often do not take the judgment tasks used by moral psychologists seriously enough for them to engage with these tasks in anything like the way they would if they came across the same tasks in the real world (also, see, Ryazanov et al. 2018). In our view, moral psychologists would do well to more frequently move their studies outside of the (online) lab and into the real world (e.g., Bollich et al. 2016; Hofmann et al. 2014).


Instead, our findings may tell us something about a genuine feature of real-life moral judgment. If so, then a natural question to ask is what makes moral judgments unstable (or stable) over time. In this paper, we have looked at three possible explanations, but we did not find evidence for them. First, because sacrificial dilemmas are in a certain sense designed to be difficult, moral judgments about acts in these scenarios may give rise to much more instability than moral judgments about other scenarios or statements. However, when we compared our test-retest correlations with a sampling of test-retest correlations from instruments involving other moral judgments, sacrificial dilemmas did not stand out. Second, we did not find evidence that moral judgment changes occur because people are more confident in their moral judgments the second time around. Third, Study 1b did not find evidence that rating changes, when they occurred, were often due to changes in light of reasons and reflection. Note that this does not mean that we can rule out any of these potential explanations for unstable moral judgments completely. As we point out below, our research is limited in the extent to which it could test each of these explanations, and so one or more of them may still have been the cause for some proportion of the rating changes we observed.