A new study from researchers at Tsinghua University and Microsoft presents ToRA, a series of novel reasoning agents that achieve remarkable improvements in solving challenging mathematical problems. The key innovation of ToRA is the seamless integration of natural language reasoning with the utilization of external tools like computation libraries and symbolic solvers.
The ToRA models are trained to interleave natural language rationales with executable code blocks, leveraging the complementary strengths of semantic reasoning and efficient computation. On two popular math datasets, GSM8k and MATH, the ToRA training pipeline utilizes the powerful capabilities of GPT-4 to generate high-quality demonstrations of tool-integrated reasoning. The resulting interactive trajectories are then used to train the open-source LLaMA series of foundation models through imitation learning and output space shaping techniques.
Evaluation across 10 diverse math reasoning tasks shows ToRA substantially outperforming prior state-of-the-art models. On the competition-level MATH benchmark, ToRA-7B attains 44.6% accuracy, surpassing the top existing open-source model by 22% absolute. More remarkably, TORA-CODE-34B reaches 50.8% on MATH, which is comparable to GPT-4 solving problems with code and significantly higher than GPT-4’s CoT prompting result of 42.5%.
The researchers posit that the tool-integrated reasoning format can potentially unlock even greater gains as models scale up in size and training techniques continue to improve. By combining linguistic and logical analysis with efficient computation, more advanced systems could attain deeper mathematical understanding and human-like problem-solving abilities.
Beyond pure math, the ToRA results demonstrate how integrating external knowledge sources like APIs and databases with foundation models can overcome inherent limitations of current systems. The tool-augmented approach may generalize to other challenging domains involving both language and symbolic reasoning, including computer programming, scientific research, and strategic decision making.
Overall, this research highlights the exciting potential of hybrid methods that draw on the strengths of diverse AI techniques. As models continue to advance, integrating reasoning, learning, knowledge and acting in a fluid and synergistic manner may pave the path toward more flexible and broadly capable AI.