This study is a pioneering endeavor to investigate the capabilities of Large
Language Models (LLMs) in addressing conceptual questions within the domain of
mechanical engineering with a focus on mechanics. Our examination involves a
manually crafted exam encompassing 126 multiple-choice questions, spanning
various aspects of mechanics courses, including Fluid Mechanics, Mechanical
Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of
Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5),
ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against
engineering faculties and students with or without mechanical engineering
background. The findings reveal GPT-4’s superior performance over the other two
LLMs and human cohorts in answering questions across various mechanics topics,
except for Continuum Mechanics. This signals the potential future improvements
for GPT models in handling symbolic calculations and tensor analyses. The
performances of LLMs were all significantly improved with explanations prompted
prior to direct responses, underscoring the crucial role of prompt engineering.
Interestingly, GPT-3.5 demonstrates improved performance with prompts covering
a broader domain, while GPT-4 excels with prompts focusing on specific
subjects. Finally, GPT-4 exhibits notable advancements in mitigating input
bias, as evidenced by guessing preferences for humans. This study unveils the
substantial potential of LLMs as highly knowledgeable assistants in both
mechanical pedagogy and scientific research.

Analysis: The Potential of Large Language Models in Mechanical Engineering

Large Language Models (LLMs) are a transformative technology that has the potential to revolutionize various fields, including mechanical engineering. In this pioneering study, the capabilities of three LLMs were evaluated in addressing conceptual questions within the domain of mechanics. Spanning a wide range of topics, these questions allowed for a comprehensive assessment of the LLMs’ performance.

One of the key findings of this study is the superior performance of GPT-4 among the three LLMs tested. This indicates the continuous advancements in LLM technology, as each subsequent model builds upon the strengths of its predecessors. Notably, GPT-4 showcased remarkable proficiency in answering questions across various mechanics topics, underscoring its potential as a highly knowledgeable assistant in both mechanical pedagogy and scientific research.

However, it is important to acknowledge that GPT-4’s performance was not superior in the specific area of Continuum Mechanics. This provides valuable insights into the multidisciplinary nature of mechanical engineering. Continuum Mechanics, being a branch that deals with the mechanical behavior of materials that are constantly changing, presents unique challenges that may require further refinement of LLMs in order to address. This highlights the need for continued research and development in this area.

Prompt engineering emerged as a critical factor in enhancing the performance of LLMs. The inclusion of explanations prompted prior to direct responses significantly improved the accuracy of the models. This implies that careful crafting of prompts and providing contextual information can enhance the LLMs’ understanding and reasoning capabilities. Prompt engineering is a multidisciplinary skill that draws upon expertise from both natural language processing and mechanical engineering.

Interestingly, the study also revealed variations in performance between GPT-3.5 and GPT-4 based on prompt characteristics. GPT-3.5 demonstrated improved performance when prompts covered a broader domain, indicating its ability to handle a wide range of topics. On the other hand, GPT-4 excelled when prompts focused on specific subjects, suggesting its specialization and in-depth knowledge in targeted areas of mechanics. This highlights the importance of tailoring prompts to optimize the performance of LLMs based on the desired outcome.

Furthermore, the study showcased notable advancements in GPT-4’s ability to mitigate input bias. Input bias refers to the influence of certain biases in data that can impact the model’s predictions. The fact that GPT-4 reduced its guessing preferences for humans indicates a positive step towards addressing potential biases and promoting fairness in LLM outputs. This aspect of bias mitigation is crucial in ensuring that LLMs are tools that can be trusted in research settings.

In conclusion, this study provides compelling evidence for the substantial potential of LLMs in the field of mechanical engineering. The findings highlight the importance of prompt engineering, prompt customization, and continuous model development in achieving superior performance. While LLMs have already demonstrated their competence as knowledgeable assistants, there are still areas, such as Continuum Mechanics, that require further exploration and refinement. As LLM technology continues to evolve, it holds promise for transforming not only the educational landscape but also the scientific research and development endeavors in mechanical engineering and beyond.

Read the original article