Bastani, H. et al. (July 15, 2024).
Available at SSRN:
Abstract
Generative artificial intelligence (AI) is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving human productivity. However, a key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs. We study the impact of generative AI, specifically OpenAI's GPT-4, on human learning in the context of math classes at a high school. In a field experiment involving nearly a thousand students, we have deployed and evaluated two GPT based tutors, one that mimics a standard ChatGPT interface (called GPT Base) and one with prompts designed to safeguard learning (called GPT Tutor). These tutors comprise about 15% of the curriculum in each of three grades. Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards included in GPT Tutor. Our results suggest that students attempt to use GPT-4 as a "crutch" during practice problem sessions, and when successful, perform worse on their own. Thus, to maintain long-term productivity, we must be cautious when deploying generative AI to ensure humans continue to learn critical skills.
Here are some thoughts:
The deployment of GPT-based tutors in educational settings presents a cautionary tale. While generative AI tools like ChatGPT can make tasks significantly easier for humans, they also risk deteriorating our ability to effectively learn essential skills. This phenomenon is not new, as previous technologies like typing and calculators have also reduced the need for certain skills. However, ChatGPT's broader intellectual capabilities and propensity for providing incorrect responses make it unique.
Unlike earlier technologies, ChatGPT's unreliability and tendency to provide incorrect responses pose significant challenges. Students may struggle to detect these errors or be unwilling to invest the effort required to verify the accuracy of ChatGPT's responses. This can negatively impact their learning and understanding of critical skills. The text suggests that more work is needed to ensure generative AI enhances education rather than diminishes it.
The findings underscore the importance of critical thinking and media literacy in the age of AI. Educators must be aware of the potential risks and benefits of AI-powered tools and design them to augment human capabilities rather than replace them. Accountability and transparency in AI development and deployment are crucial to mitigating these risks. By acknowledging these challenges, we can harness the potential of AI to enhance education and promote meaningful learning.