This study aimed to examine an assumption that generative artificial
intelligence (GAI) tools can overcome the cognitive intensity that humans
suffer when solving problems. We compared the performance of ChatGPT and GPT-4
on 2019 NAEP science assessments with students by cognitive demands of the
items. Fifty-four tasks were coded by experts using a two-dimensional cognitive
load framework, including task cognitive complexity and dimensionality. ChatGPT
and GPT-4 responses were scored using the scoring keys of NAEP. The analysis of
the available data was based on the average student ability scores for students
who answered each item correctly and the percentage of students who responded
to individual items. Results showed that both ChatGPT and GPT-4 consistently
outperformed most students who answered the NAEP science assessments. As the
cognitive demand for NAEP tasks increases, statistically higher average student
ability scores are required to correctly address the questions. This pattern
was observed for students in grades 4, 8, and 12, respectively. However,
ChatGPT and GPT-4 were not statistically sensitive to the increase in cognitive
demands of the tasks, except for Grade 4. As the first study focusing on
comparing GAI and K-12 students in problem-solving in science, this finding
implies the need for changes to educational objectives to prepare students with
competence to work with GAI tools in the future. Education ought to emphasize
the cultivation of advanced cognitive skills rather than depending solely on
tasks that demand cognitive intensity. This approach would foster critical
thinking, analytical skills, and the application of knowledge in novel
contexts. Findings also suggest the need for innovative assessment practices by
moving away from cognitive intensity tasks toward creativity and analytical
skills to avoid the negative effects of GAI on testing more efficiently.

In this study, the researchers aimed to explore whether generative artificial intelligence (GAI) tools, specifically ChatGPT and GPT-4, could outperform human students in solving problem-solving tasks. The cognitive demands of the tasks were assessed using a two-dimensional cognitive load framework, which considered task cognitive complexity and dimensionality. The performance of ChatGPT and GPT-4 was compared to the average student ability scores for each item in the 2019 NAEP science assessments.

The results showed that both ChatGPT and GPT-4 consistently outperformed most students who took the NAEP science assessments. This finding is significant because it suggests that GAI tools have the potential to overcome the cognitive challenges that humans often face when solving complex problems. However, it is important to note that GAI tools did not demonstrate statistically significant sensitivity to the increase in cognitive demands of the tasks, except for Grade 4.

The multi-disciplinary nature of this study is noteworthy. It combines insights from artificial intelligence, cognitive psychology, and education. By comparing the performance of GAI tools to human students, the researchers shed light on the potential impact of AI on education. The findings imply that educational objectives should be revised to include the cultivation of advanced cognitive skills. Rather than relying solely on tasks that demand cognitive intensity, education should focus on fostering critical thinking, analytical skills, and the application of knowledge in novel contexts.

Furthermore, this study also highlights the need for innovative assessment practices in education. Traditional tests that heavily rely on cognitive intensity might not effectively assess students’ creativity and analytical skills, which are increasingly important in a world where GAI tools are being utilized more frequently. By moving away from tasks that solely assess cognitive intensity and incorporating more open-ended and creative problem-solving tasks, educators can better prepare students for future environments where they will be working alongside GAI tools.

In conclusion, this study demonstrates that GAI tools have the potential to outperform human students in problem-solving tasks. It emphasizes the importance of revising educational objectives to prioritize the development of higher-order cognitive skills and the need for innovative assessment practices that go beyond cognitive intensity. By embracing these changes, educators can prepare students to navigate a future where AI will play an increasingly prominent role in various domains.

Read the original article