Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Saturday, July 1, 2023

Inducing anxiety in large language models increases exploration and bias

Coda-Forno, J., Witte, K., et al. (2023).
arXiv preprint arXiv:2304.11111.

Abstract

Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

From the Discussion section

What do we make of these results? It seems like GPT-3.5 generally performs best in the neutral condition, so a clear recommendation for prompt-engineering is to try and describe a problem as factually and neutrally as possible. However, if one does use emotive language, then our results show that anxiety-inducing scenarios lead to worse performance and substantially more biases. Of course, the neutral conditions asked GPT-3.5 to talk about something it knows, thereby possibly already contextualizing the prompts further in tasks that require knowledge and measure performance. However, that anxiety-inducing prompts can lead to more biased outputs could have huge consequences in applied scenarios. Large language models are, for example, already used in clinical settings and other high-stake contexts. If they produce higher biases in situations when a user speaks more anxiously, then their outputs could actually become dangerous. We have shown one method, which is to run psychiatric studies, that could capture and prevent such biases before they occur.

In the current work, we intended to show the utility of using computational psychiatry to understand foundation models. We observed that GPT-3.5 produced on average higher anxiety scores than human participants. One possible explanation for these results could be that GPT-3.5’s training data, which consists of a lot of text taken from the internet, could have inherently shown such a bias, i.e. containing more anxious than happy statements. Of course, large language models have just become good enough to perform psychological tasks, and whether or not they intelligently perform them is still a matter of ongoing debate.