Qwen2-Math and its specialized series, Qwen2-Math-Instruct, have emerged as leaders in the realm of mathematical problem-solving among large language models (LLMs). Developed on the backbone of the Qwen2 series, these models have been fine-tuned to excel in arithmetic and complex mathematics, outperforming both open-source and closed-source competitors, including GPT-4o and Claude 3.5.
At the core of Qwen2-Math are base models initialized from the Qwen2 series, which have been pre-trained on a Mathematics-specific Corpus. This corpus encompasses a broad array of mathematical data, from web texts and books to exam questions. These models have been rigorously tested against prominent English and Chinese mathematical benchmarks, where they consistently delivered superior performance.
In a move to refine their capabilities further, Qwen2-Math-Instruct models integrate a math-specific reward model trained on the Qwen2-Math-72B, enhanced via Rejection Sampling and Group Relative Policy Optimization methods. This specialized training has led to remarkable performance across several complex mathematical benchmarks, including those used in the Chinese Gaokao and various mathematics competitions like the AIME and AMC.
Qwen2-Math-Instruct’s robustness is particularly evident in its performance metrics across multiple settings, where the model shows enhanced accuracy and reliability. For instance, in settings that involve more complex multi-choice questions, the model demonstrates a preference for the RM@8 measure, which typically yields more reliable outcomes over Maj@8, especially evident in the 1.5B and 7B models.
The commitment to maintaining the integrity of training and evaluation data is evident through rigorous decontamination methods. These include exact match removal and 13-gram deduplication processes to ensure the models’ training data does not compromise their ability to genuinely solve new and unseen problems.
As we await the release of bilingual models supporting both English and Chinese, Qwen2-Math continues to set benchmarks in the development of LLMs focused on enhancing mathematical reasoning and problem-solving capabilities. For more details on this innovative model series, visit Qwen2-Math’s official blog.