DeepSeek Coder V2 Outperforms GPT-4 Turbo in Coding and Math Benchmarks


Chinese AI startup DeepSeek recently unveiled DeepSeek Coder V2, an open-source code language model that has surpassed state-of-the-art closed-source models like GPT-4 Turbo and Llama-3 70B in coding and math benchmarks. With its robust capabilities in more than 300 programming languages and a massive context window of 128K, this model builds on the 33 billion-parameter DeepSeek Coder, broadening its potential for handling complex coding scenarios.

DeepSeek Coder V2 demonstrated superior performance in multiple evaluations: scoring 76.2 on MBPP+, 90.2 on HumanEval, and 73.7 on Aider benchmarks. This positions it ahead of not only GPT-4 Turbo but also other notable competitors such as Claude 3 Opus and Gemini 1.5 Pro. Even in general language and reasoning, the model scored an impressive 79.2 on the MMLU benchmark, closely competing with other high-performing models and nearly matching the 88.7 score of GPT-4o.

The success of DeepSeek Coder V2 suggests a significant shift towards high-performing, accessible AI tools, further evidenced by its availability under the MIT license for both commercial and research purposes through Hugging Face or API. To explore the model’s capabilities, users can also interact with it via a chatbot provided by the company.

For more details on DeepSeek Coder V2, visit VentureBeat.