Introducing Qwen2.5-Turbo: A Leap in Long-Context Language Processing
Qwen2.5-Turbo marks a significant advancement in language model capabilities, addressing the community’s demand for processing longer contexts. This upgraded version extends the model’s context length from 128k to an impressive 1 million tokens, equivalent to handling 10 full-length novels or 1.5 million Chinese characters at once. It not only achieves a groundbreaking 100% accuracy in the 1M length Passkey Retrieval task but also outperforms GPT-4 and other models in the long text evaluation benchmark RULER with a score of 93.1.
Moreover, Qwen2.5-Turbo brings a remarkable improvement in inference speed, reducing the time to first token for a 1M token context from 4.9 minutes to just 68 seconds, thanks to its efficient use of sparse attention mechanisms. This speedup does not come at a higher cost, maintaining the price at ¥0.3 per 1M tokens, allowing for processing 3.6 times the number of tokens compared to GPT-4o-mini at the same cost.
The model’s versatility is showcased through demos, including deep understanding of long novels, repository-level code assistance, and reading multiple papers, demonstrating its potential in a wide range of applications. Qwen2.5-Turbo is accessible through Alibaba Cloud Model Studio, HuggingFace Demo, and ModelScope Demo, making it readily available for developers and researchers.
This release not only sets a new standard for long-context processing in language models but also maintains strong performance in short sequence tasks, ensuring broad applicability. With ongoing efforts to further optimize long-context model performance and reduce inference costs, Qwen2.5-Turbo represents a significant step forward in natural language processing technology.
Read more at Qwen…