WizardLM-2: A New Contender in the Language Model Arena Surpasses Many, Nears GPT-4

WizardLM-2, a new language model, has been rigorously evaluated for its performance against a variety of baselines through both human and automatic assessments. In a detailed human preferences evaluation involving complex real-world instructions, WizardLM-2 showcased its competitive edge. Notably, its 8x22B variant slightly trails behind the proprietary GPT-4-1106-preview but outperforms other models like Command R Plus and GPT4-0314. The 70B version of WizardLM-2 surpasses models such as GPT4-0613, Mistral-Large, and Qwen1.5-72B-Chat, while the 7B variant is on par with Qwen1.5-32B-Chat and exceeds the capabilities of Qwen1.5-14B-Chat and Starling-LM-7B-beta. These findings position WizardLM-2 remarkably close to the forefront of proprietary models and significantly ahead of its open-source counterparts, marking it as a formidable contender in the realm of language models.
Read more…