Dillion, D., Mondal, D., Tandon, N., & Gray, K. (2025).
Scientific Reports, 15(1).
Abstract
People view AI as possessing expertise across various fields, but the perceived quality of AI-generated moral expertise remains uncertain. Recent work suggests that large language models (LLMs) perform well on tasks designed to assess moral alignment, reflecting moral judgments with relatively high accuracy. As LLMs are increasingly employed in decision-making roles, there is a growing expectation for them to offer not just aligned judgments but also demonstrate sound moral reasoning. Here, we advance work on the Moral Turing Test and find that Americans rate ethical advice from GPT-4o as slightly more moral, trustworthy, thoughtful, and correct than that of the popular New York Times advice column, The Ethicist. Participants perceived GPT models as surpassing both a representative sample of Americans and a renowned ethicist in delivering moral justifications and advice, suggesting that people may increasingly view LLM outputs as viable sources of moral expertise. This work suggests that people might see LLMs as valuable complements to human expertise in moral guidance and decision-making. It also underscores the importance of carefully programming ethical guidelines in LLMs, considering their potential to influence users’ moral reasoning.
Here are some thoughts.
This research investigates how people perceive AI, particularly large language models (LLMs) like GPT-4o, as moral experts. The study compares the ethical advice and justifications provided by GPT models to those of "The Ethicist" from the New York Times and a representative sample of Americans. Findings reveal that participants rated GPT-4o's advice as slightly more moral, trustworthy, thoughtful, and correct than that of the renowned ethicist, and that GPT models outperformed average Americans in justifying their moral judgments. This suggests a potential shift in how people perceive moral authority, with LLMs increasingly seen as viable sources of moral expertise.
The study underscores the importance of carefully programming ethical guidelines into LLMs, given their potential to influence users' moral reasoning. It also raises questions about the psychology of trust in AI, how AI-generated moral advice interacts with existing moral intuitions and biases, and the impact of moral language on perceptions of credibility. This research highlights the need for interdisciplinary collaboration between ethicists, psychologists, and computer scientists to address the complex ethical and psychological implications of AI moral reasoning and ensure its responsible and beneficial use.