arXiv:2402.17786v1 Announce Type: new
Abstract: Using Large Language Models for complex mathematical reasoning is difficult, primarily due to the complexity of multi-step reasoning. The main challenges of this process include (1) selecting critical intermediate results to advance the procedure, and (2) limited exploration of potential solutions. To address these issues, we introduce a novel algorithm, namely Stepwise Self-Consistent Chain-of-Thought (SSC-CoT). SSC-CoT employs a strategy of selecting intermediate steps based on the intersection of various reasoning chains. Additionally, SSC-CoT enables the model to discover critical intermediate steps by querying a knowledge graph comprising relevant domain knowledge. To validate SSC-CoT, we present a new dataset, TriMaster100, tailored for complex trigonometry problems. This dataset contains 100 questions, with each solution broken down into scored intermediate steps, facilitating a comprehensive evaluation of the mathematical reasoning process. On TriMaster100, SSC-CoT triples the effectiveness of the state-of-the-art methods. Furthermore, we benchmark SSC-CoT on the widely recognized complex mathematical question dataset, MATH level 5, and it surpasses the second-best method by 7.2% in accuracy. Code and the TriMaster100 dataset can be found at: https://github.com/zhao-zilong/ssc-cot.

Using Large Language Models for complex mathematical reasoning

The utilization of Large Language Models (LLMs) for complex mathematical reasoning poses significant challenges due to the complexity of multi-step reasoning involved. This article presents a novel algorithm, SSC-CoT, which aims to address these challenges and improve the effectiveness of LLMs in mathematical problem-solving.

The Challenges of Using LLMs in Mathematical Reasoning

When it comes to complex mathematical reasoning, LLMs face two main challenges:

  1. Selection of Critical Intermediate Results: Multi-step reasoning requires the identification and selection of critical intermediate results to move the reasoning procedure forward. This selection process is crucial for arriving at the correct solution.
  2. Limited Exploration of Potential Solutions: LLMs typically have limited exploration capabilities, making it challenging to examine and consider a wide range of potential solutions.

The Solution: Stepwise Self-Consistent Chain-of-Thought (SSC-CoT)

To overcome the challenges mentioned above, the authors propose a novel algorithm called SSC-CoT.

SSC-CoT adopts a strategy of selecting intermediate steps based on the intersection of various reasoning chains. This approach allows the model to identify critical intermediate results by considering multiple paths of reasoning.

In addition to this, SSC-CoT leverages a knowledge graph that contains relevant domain knowledge. By querying this knowledge graph, the model can discover critical intermediate steps, further enhancing its reasoning process.

Evaluation and Results

To evaluate the effectiveness of SSC-CoT, the authors introduce a new dataset called TriMaster100. This dataset focuses on complex trigonometry problems and includes 100 questions, with each solution broken down into scored intermediate steps.

On the TriMaster100 dataset, SSC-CoT demonstrates impressive results, as it doubles the effectiveness of state-of-the-art methods. This improvement highlights the potential of SSC-CoT in enhancing the accuracy and efficiency of LLMs in mathematical reasoning tasks.

Furthermore, SSC-CoT is benchmarked on the MATH level 5 dataset, a well-recognized collection of complex mathematical questions. In this benchmark, SSC-CoT outperforms the second-best method by 7.2% in accuracy. These results signify the superiority of SSC-CoT over existing approaches in tackling complex mathematical reasoning problems.

Conclusion

The development of SSC-CoT represents a significant advancement in the field of using LLMs for complex mathematical reasoning. By addressing the challenges of selecting critical intermediate results and limited exploration, SSC-CoT substantially improves the effectiveness of LLMs. Its success on the TriMaster100 and MATH level 5 datasets highlights its potential for practical applications in mathematical problem-solving. Future research may explore the extension of SSC-CoT to other domains and further enhance its capabilities in multi-disciplinary scenarios.

Access to the code and the TriMaster100 dataset can be found at: https://github.com/zhao-zilong/ssc-cot.

Read the original article