arXiv:2407.04831v1 Announce Type: new
Abstract: Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authenticity and reliability in terms of integration as they might not follow the user requirements, provide incorrect and/or nonsensical outputs, or even contain semantic/syntactic errors – overall known as LLM hallucination. In this work, we present several types of code hallucination. We have generated such hallucinated code manually using large language models. We also present a technique – HallTrigger, in order to demonstrate efficient ways of generating arbitrary code hallucination. Our method leverages 3 different dynamic attributes of LLMs to craft prompts that can successfully trigger hallucinations from models without the need to access model architecture or parameters. Results from popular blackbox models suggest that HallTrigger is indeed effective and the pervasive LLM hallucination have sheer impact on software development.
Generative models, particularly large language models (LLMs), have become popular tools for code generation and assistance in software development. However, research has shown that the code generated by LLMs can often suffer from unreliable and incorrect outputs, which is known as LLM hallucination.
In this work, the authors aim to explore and understand the various types of code hallucination that can occur with LLMs. They manually generated examples of hallucinated code using LLMs and also introduced a technique called HallTrigger to efficiently generate arbitrary code hallucination.
HallTrigger leverages three different dynamic attributes of LLMs to craft prompts that can effectively trigger hallucinations without the need for accessing the model architecture or parameters. This technique can be applied to popular blackbox models, indicating that LLM hallucination is a pervasive issue in software development.
The multi-disciplinary nature of this work is evident in the combination of programming and natural language processing techniques. By exploring the limitations and challenges of LLMs in generating reliable code, the authors highlight the need for further research and development in this field.
Moving forward, it is crucial to address the LLM hallucination problem to ensure the integration and usability of code generated by these models. This could involve refining the training data and fine-tuning the models to better align with user requirements. Additionally, exploring techniques such as code validation and post-generation analysis can aid in identifying and filtering out unreliable code outputs.
Overall, this research sheds light on an important issue in code generation using LLMs and opens the door for further advancements in improving the reliability and authenticity of generated code. The collaboration between programming and natural language processing experts will be essential in finding effective solutions to the problem of LLM hallucination.