arXiv:2504.06514v1 Announce Type: new
Abstract: We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking. This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Such failures are against the “test-time scaling law” but have been widely observed on multiple datasets we curated with MiP, indicating the harm of cheap overthinking and a lack of critical thinking. Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries. This implies a critical flaw of the current training recipe for reasoning LLMs, which does not encourage efficient thinking adequately, leading to the abuse of thinking patterns. To further investigate the reasons behind such failures, we conduct fine-grained analyses of the reasoning length, overthinking patterns, and location of critical thinking on different types of LLMs. Moreover, our extended ablation study reveals that the overthinking is contagious through the distillation of reasoning models’ responses. These results improve the understanding of overthinking and shed novel insights into mitigating the problem.
Analysis of the Effects of Ill-Posed Questions on Reasoning LLMs
The study presented in this article focuses on the response length of reasoning language models (LLMs) when presented with ill-posed questions that contain missing premises (MiP). The authors find that both reinforcement learning and supervised learning trained LLMs tend to demonstrate a significant increase in response length when faced with MiP questions, leading to redundant and ineffective thinking. This trend, which the authors term as MiP-Overthinking, represents a deviation from the expected “test-time scaling law” and highlights the prevalence of overthinking in LLMs.
One of the key insights from this study is the observation that LLMs not specifically trained for reasoning perform better on the MiP scenario. These models exhibit shorter responses that quickly identify the ill-posed nature of the queries. This suggests a critical flaw in the current training recipe for reasoning LLMs, which fails to sufficiently encourage efficient thinking and instead promotes thinking patterns that are prone to abuse.
The interdisciplinary nature of this study becomes apparent when considering the implications of overthinking and lack of critical thinking in LLMs. These models are designed to process and generate human-like language, which is inherently tied to cognitive processes. By investigating the reasons behind the failure of LLMs in handling MiP questions, the authors provide valuable insights into the relationships between language processing, reasoning, and critical thinking.
Fine-Grained Analysis and Ablation Study
To gain a deeper understanding of the phenomena observed, the authors conducted a fine-grained analysis of reasoning length, overthinking patterns, and the location of critical thinking in different types of LLMs. This analysis helps identify specific characteristics and patterns associated with overthinking, providing valuable information for mitigating the problem.
In addition, the authors conducted an extended ablation study, which revealed that overthinking can be contagious through the distillation of reasoning models’ responses. This finding has implications for the training and deployment of LLMs, as it suggests that the overthinking behavior of one model can influence and propagate to other models.
Implications and Mitigation Strategies
The findings of this study improve our understanding of overthinking in reasoning LLMs and offer insights into potential mitigation strategies. By shedding light on the flaws in the current training recipe, the authors pave the way for more efficient and effective thinking patterns in LLMs.
One possible mitigation strategy could involve incorporating explicit encouragement for efficient thinking during the training process of reasoning LLMs. By explicitly rewarding models for concise and accurate responses, the training recipe could steer LLMs away from overthinking and towards more efficient reasoning strategies.
Furthermore, the insights gained from the fine-grained analysis and ablation study can inform the development of novel architectures or modifications to existing LLMs that better handle ill-posed questions and reduce overthinking tendencies. This multi-disciplinary approach, combining insights from cognitive science, natural language processing, and machine learning, holds promise for improving the performance and reliability of reasoning LLMs.