LLaMA-2-7B-32K Pushes the Limits of Context Length

Together AI, an AI research company, published a post detailing their work on extending the context length for large language models (LLMs) up to 32,000 tokens by releasing LLaMA-2-7B-32K, an open source 32K context model built on top of 4K context LLaMA-2 base version.

The training of LLaMA-2-7B-32K required significant computational resources. According to the publication, pretraining was performed using 512 TPU v4 chips. The continued pretraining with linear interpolation to extend the context length to 32K tokens took around 1.5 billion steps, which corresponds to approximately 3 days of training given the hardware used.

The key innovation was using linear interpolation during continued pretraining to smoothly transition the model to handling much longer contexts. They also emphasized the importance of using a diverse data recipe with a mix of long-form texts like books and abstracts as well as instructional data to encourage the model to leverage the full context.

Through two demonstration tasks – multi-document question answering and long-form summarization – Together AI showed substantial gains from finetuning LLaMA-2-7B-32K compared to the base LLaMA-2 model at 4K tokens. For example, on the summarization task, ROUGE scores increased from 0.063 to 0.355 for ROUGE-1.

To enable efficient training and inference at such long context lengths, Together AI integrated FlashAttention-2, sparse attention mechanism. This provided up to 3x speedups compared to previous optimized implementations.

The ability to handle lengthy contexts opens up new possibilities for document-level understanding and generation tasks. As models continue to push past 32K tokens, we may see AI systems that can read and synthesize information from entire books and papers. This could enable applications like automatic literature reviews, book summarization tools, and assistants that can answer questions with evidence aggregated across documents.

Of course, longer contexts also introduce new challenges around bias, safety and computational requirements that will need to be addressed. But Together AI’s work represents an important step towards more capable and generalizable LLMs. Their commitment to releasing these models openly will empower researchers worldwide to build upon these foundations.

LLaMA-2-7B-32K Pushes the Limits of Context Length

Related

Leave a ReplyCancel reply

Shingles Vaccine Linked to Lower Dementia Risk in Long-Term Study

DeepMind’s Silence: How Openness in AI Research Is Fading

Why Passwords Aren’t the Problem—But How We Use Them Is

Claude 3.7 Sonnet Set to Expand Context Window to 500K Tokens

IngressNightmare: Critical Flaws in NGINX Controller Expose Kubernetes Clusters to RCE

Google’s Gemini 2.5 Pro Thinks Slower to Answer Smarter

In Pursuit of Efficiency: Rethinking AI with DeepSeek-V3-0324

AI-Generated Research: Charting New Territory in Peer-Reviewed Science

Awesome MCP Clients, A New Way To Interact With LLMs