Claude 3 Opus, developed by Anthropic, has surpassed OpenAI’s GPT-4 in the LMSYS Chatbot Arena, a leaderboard that ranks large language models based on blind human votes. This marks the first time GPT-4 has been dethroned since its launch. The Chatbot Arena, which began in May of the previous year, has collected over 400,000 votes, featuring models from Anthropic, OpenAI, Google, and newcomers like Mistral and Alibaba.
Claude 3 Opus’s victory is significant, but the margin is narrow, and with OpenAI’s GPT-5 on the horizon, Anthropic’s lead may be short-lived. The arena uses the Elo rating system, similar to chess, to rank the chatbots. Despite some limitations, such as missing models and occasional loading issues, the leaderboard is a competitive space for AI models.
Interestingly, even Anthropic’s smaller model, Claude 3 Haiku, has shown impressive performance, rivaling larger models in blind tests. The top ten includes all three Claude 3 variants, with Sonnet and Haiku also ranking high. The leaderboard is dominated by proprietary models, indicating a challenge for open-source AI to catch up. However, developments in open-source and decentralized AI, such as Meta’s upcoming Llama 3 and initiatives by StabilityAI, suggest a dynamic future for AI competition.
Read more at Tom’s Guide…