Llama 4

Meta has introduced the first models in its Llama 4 series: Llama 4 Scout and Llama 4 Maverick. These open-weight models are natively multimodal, capable of processing text and visual input in a unified architecture, and mark a shift toward more efficient, large-context, expert-driven architectures. Built using a Mixture-of-Experts (MoE) approach, they offer significant gains in performance, efficiency, and usability across a wide range of AI applications, from coding and reasoning to image understanding and long-context analysis.

Llama 4 Scout and Maverick both feature 17 billion active parameters, but differ in expert configurations—Scout uses 16 experts, while Maverick is built with 128 experts, making it more dynamic for multitask performance. Both models can run on a single H100 GPU system thanks to architectural optimizations, including quantization and MoE sparsity. The active parameter count—only a subset of the total 109B (Scout) and 400B (Maverick) parameters—is a hallmark of the MoE approach, keeping compute costs manageable without sacrificing output quality.

One of the most striking capabilities of Llama 4 Scout is its 10 million token context window, making it the longest available in an open-weight model. Pre-trained and post-trained with 256K tokens and evaluated on datasets involving extensive multi-document reasoning and codebases, it excels in tasks that previously demanded specialized architectures. It uses an iRoPE (interleaved RoPE) attention design—without positional embeddings—and dynamic attention scaling at inference, a strategy that improves generalization to unseen sequence lengths.

Llama 4 Maverick, positioned as Meta’s new workhorse model, targets assistant and chat applications with advanced multimodal and reasoning capabilities. It performs competitively with much larger models like DeepSeek V3.1 while maintaining a fraction of the active parameter count. Its design integrates alternating dense and MoE layers, with a shared expert and token-level expert routing, to ensure throughput efficiency. This design allows for high-quality image understanding and language modeling on par or better than GPT-4o and Gemini 2.0.

Both models benefit from distillation from Llama 4 Behemoth, Meta’s teacher model with 288 billion active parameters and nearly 2 trillion total parameters. Although not yet publicly released, Behemoth has demonstrated state-of-the-art results on STEM benchmarks including MATH-500 and GPQA Diamond. Codistillation from Behemoth was used to train Scout and Maverick with a novel loss function that combines soft and hard targets. This process amortized the compute costs while improving end-task metrics in reasoning and multilingual benchmarks.

Training these models involved new strategies. Meta employed FP8 precision for improved FLOPs utilization (reaching 390 TFLOPs/GPU for Behemoth), a new technique called MetaP for setting hyperparameters across varying model sizes and depths, and training on over 30 trillion tokens, including diverse image and video data. These models are also capable of processing up to 8 images in a single prompt during post-training evaluation, having been pre-trained with 48-image sequences.

Post-training employed a multi-stage approach: lightweight supervised fine-tuning, online reinforcement learning, and direct preference optimization (DPO). Notably, a dynamic data filtering strategy was used to maintain model quality—removing “easy” prompts and continuously refining training data based on model performance. This pipeline enabled models to retain strong reasoning and conversational abilities while efficiently handling mixed-modality prompts.

Meta has also made strides in safety and bias reduction. Through tools like Llama Guard and Prompt Guard, developers can integrate input/output filtering into their deployments. Their internal evaluation framework, GOAT, simulates adversarial interaction to test and improve model robustness across edge cases. Meta reports that Llama 4 models now respond to politically sensitive topics with less refusal and more balanced outputs compared to previous generations.

Both Llama 4 Scout and Llama 4 Maverick are available for download via llama.com and Hugging Face. Further details and ongoing research will be shared at Meta’s upcoming LlamaCon on April 29. For full technical context, the official blog post is available here.

Related

Leave a ReplyCancel reply

Smarter GitHub Automation with the MCP Server

China Unveils GPMI: A Single-Cable Standard for 8K Video and High Power

When Weather Apps Steal Your SSH Keys

Llama 4

Tame Your Terminal: Managing AI Coding Agents with Claude Squad

Command Smarts: Exploring the Power of MCP Tools

Shingles Vaccine Linked to Lower Dementia Risk in Long-Term Study

DeepMind’s Silence: How Openness in AI Research Is Fading

Why Passwords Aren’t the Problem—But How We Use Them Is