Mittelstädt, J. M., et al. (2024).
Scientific Reports, 14(1).
Abstract
Large language models (LLM) have been a catalyst for the public interest in artificial intelligence (AI). These technologies perform some knowledge-based tasks better and faster than human beings. However, whether AIs can correctly assess social situations and devise socially appropriate behavior, is still unclear. We conducted an established Situational Judgment Test (SJT) with five different chatbots and compared their results with responses of human participants (N = 276). Claude, Copilot and you.com’s smart assistant performed significantly better than humans in proposing suitable behaviors in social situations. Moreover, their effectiveness rating of different behavior options aligned well with expert ratings. These results indicate that LLMs are capable of producing adept social judgments. While this constitutes an important requirement for the use as virtual social assistants, challenges and risks are still associated with their wide-spread use in social contexts.
Here are some thoughts:
This research assesses the social judgment capabilities of large language models (LLMs) by administering a Situational Judgment Test (SJT), a standardized test for work or critical situation decisions, to five popular chatbots and comparing their performance to a human control group. The study found that several LLMs significantly outperformed humans in identifying appropriate behaviors in complex social scenarios. While LLMs demonstrated high consistency in their responses and agreement with expert ratings, the study notes limitations including potential biases and the need for further investigation into real-world application and the underlying mechanisms of their social judgment. The results suggest LLMs possess considerable potential as social assistants, but also highlight ethical considerations surrounding their use.