García-Torres, D., et al. (2025).
JMIR Medical Education, 11, e78857.
Abstract
Background:
Virtual simulated patients (VSPs) powered by generative artificial intelligence (GAI) offer a promising tool for training clinical interviewing skills; yet, little is known about how different system- and user-level variables shape students’ perceptions of these interactions.
Objective:
We aim to study psychology students’ perceptions of GAI-driven VSPs and examine how demographic factors, system parameters, and interaction characteristics influence such perceptions.
Methods:
We conducted a total of 1832 recorded interactions involving 156 psychology students with 13 GAI-generated VSPs configured with varying temperature settings (0.1, 0.5, 0.9). For each student, we collected age and sex; for each interview, we recorded interview length (total number of question–answer turns), number of connectivity failures, the specific VSP consulted, and the model temperature. After every interview, students provided a 1-10 global rating and open-ended comments regarding strengths and areas for improvement. At the end of the training sequence, they also reported perceived improvement in diagnostic ability. Statistical analyses assessed the influence of different variables on global ratings: demographics, interaction-level data, and GAI temperature setting. Sentiment analysis was conducted to evaluate the VSPs’ clinical realism.
Results:
Statistical analysis showed that female students rated the tool significantly higher (mean rating 9.25/10) than male students (mean rating 8.94/10; Kruskal-Wallis test, H=8.7; P=.003). On the other side, no significant correlation was found between global rating and age (r=0.02, 95% CI –0.03 to 0.06; P=.42), interview length (r=0.04, 95% CI –0.2 to 0.10; P=.18), or frequency of participation (Kruskal-Wallis test, H=4.62; P=.20). A moderate negative correlation emerged between connectivity failures and ratings (r=–0.26, 95% CI –0.41 to –0.10; P=.002). Temperature settings significantly influenced ratings (Kruskal-Wallis test, H=6.93; P=.03; η²=0.02), with higher scores at temperature 0.9 compared with 0.1 (Dunn’s test, P=.04). Concerning learning outcomes, self-perceived improvement in diagnostic ability was reported by 94% (94/100) of students; however, final practical examination scores (mean 6.67, SD 1.42) did not differ significantly from those of the previous cohort without VSP training (mean 6.42, SD 1.56). Sentiment analysis indicated predominantly negative sentiment in GAI responses (median negativity 0.8903, IQR 0.306-0.961), consistent with clinical realism.
Conclusions:
GAI-driven VSPs were well-received by psychology students, with student gender and system-level variables (particularly temperature settings and connection stability) shaping user evaluations. Although participants perceived the training as beneficial for their diagnostic skills, objective examination performance did not significantly differ from the previous cohort. However, lack of randomization limits the generalization of the results obtained, and further experiments are required.
Here are some thoughts:
This study is important because it demonstrates a promising application of AI in clinical training, using generative AI-powered virtual simulated patients to help psychology students practice psychopathological interviewing in a safe, low-stakes environment. The platform was highly rated by students and 94% reported meaningful improvement in their ability to identify clinically relevant symptoms. Higher AI temperature settings, which produce more natural and varied responses, were associated with greater student satisfaction, while connectivity failures reduced ratings, underscoring the importance of technical reliability. Although students found VSP-based sessions more challenging than traditional paper cases, final exam scores were comparable between cohorts, suggesting the AI simulation provides a more realistic learning experience rather than a less effective one. For practicing psychologists and educators, this study offers early empirical support for integrating AI-driven patient simulation into clinical training, while highlighting the need for randomized studies and careful calibration of AI parameters before broad adoption.








