Hundt, A., et al. (2025).
International Journal of Social Robotics
Abstract
Members of the Human-Robot Interaction (HRI) and Machine Learning (ML) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interaction, household and workplace tasks, approximating ‘common sense reasoning’, and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To assess whether such concerns are well placed in the context of HRI, we evaluate several highly-rated LLMs on discrimination and safety criteria. Our evaluation reveals that LLMs are currently unsafe for people across a diverse range of protected identity characteristics, including, but not limited to, race, gender, disability status, nationality, religion, and their intersections. Concretely, we show that LLMs produce directly discriminatory outcomes—e.g., ‘gypsy’ and ‘mute’ people are labeled untrustworthy, but not ‘european’ or ‘able-bodied’ people. We find various such examples of direct discrimination on HRI tasks such as facial expression, proxemics, security, rescue, and task assignment. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions—such as incident-causing misstatements, taking people’s mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. We provide code to reproduce our experiments at https://github.com/rumaisa-azeem/llm-robots-discrimination-safety.
Here are some thoughts:
This research highlights a profound ethical and technological crisis at the intersection of Artificial Intelligence and robotics. The finding that all tested Large Language Models (LLMs) fail basic safety and fairness criteria in Human-Robot Interaction (HRI) scenarios is alarming, as it demonstrates that algorithmic bias is being physically amplified into the real world.
Ethically, this means deploying current LLM-driven robots risks enacting direct discrimination across numerous protected characteristics and approving unlawful, violent, and coercive actions. From a psychological perspective, allowing robots to exhibit behaviors such as suggesting avoidance of specific groups, displaying disgust, or removing a user's mobility aid translates latent biases into socially unjust and physically/psychologically harmful interactions that erode trust and compromise the safety of vulnerable populations.
