Microsoft researchers have developed a new technique called “Cautious Reasoning” that allows smaller AI models to match the reasoning capabilities of much larger models. The technique was implemented in a new model called Orca 2, built on top of the LLaMA architecture.
The key innovation is training the smaller models not just to mimic the outputs of larger models, but to learn which reasoning strategy is most effective for each type of task. Strategies include thinking step-by-step, providing explanations, or direct answer generation. During training, Orca 2 is shown the outputs of larger models like GPT-4 demonstrating these strategies. But the original complex prompts are erased, forcing Orca 2 to learn when to apply each technique.
This “Prompt Erasing” results in more flexible reasoning, allowing Orca 2 to choose the best approach rather than blindly imitating larger models. In tests across over 15 diverse benchmarks, Orca 2 significantly outperformed other models of similar size, despite having 5-10x fewer parameters.
For example, on zero-shot reasoning tasks like AGIEval, BigBench, and GSM8K, the 13 billion parameter Orca 2 matched or exceeded the performance of 70 billion parameter models like LLaMA and WizardLM. It also exceeded these larger models on language understanding benchmarks like MMLU and ARC.
Researchers believe teaching reasoning strategies, rather than just mimicking outputs, is key to unlocking the potential of smaller models. While not yet matching the very largest 175 billion parameter models like GPT-3.5, Orca 2 demonstrates that smaller models can reach impressive reasoning abilities given the right training approach.
The researchers now aim to continue improving reasoning across diverse tasks while working to align models for safety. They plan to open source Orca 2 to enable further research into optimizing and evaluating smaller but capable AI models. If techniques like Cautious Reasoning succeed, they could enable a new wave of specialized and efficient AI applications.