The Llama 4 Herd

a year ago

#Artificial Intelligence
#Open Source
#Multimodal Models

Meta announces Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context length support and built using a mixture-of-experts (MoE) architecture.
Llama 4 Scout is a 17 billion active parameter model with 16 experts, fitting on a single H100 GPU, while Llama 4 Maverick is a 17 billion active parameter model with 128 experts, fitting on a single H100 host.
Llama 4 Behemoth, a teacher model, outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks but is not yet released as it is still training.
Llama 4 models are designed with native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone.
Meta developed a new training technique called MetaP to reliably set critical model hyper-parameters and pre-trained on 200 languages, including over 100 with over 1 billion tokens each.
Llama 4 Maverick offers unparalleled performance in image and text understanding, while Llama 4 Scout dramatically increases the supported context length to 10 million tokens.
Meta has made improvements in addressing bias in LLMs, with Llama 4 performing significantly better than Llama 3 and comparable to Grok.
Meta is making Llama 4 Scout and Llama 4 Maverick available for download on llama.com and Hugging Face, with availability across cloud and data platforms to follow shortly.
Meta is also previewing Llama 4 Behemoth, a teacher model with 288B active parameters and nearly two trillion total parameters, offering state-of-the-art performance for non-reasoning models.
Meta has integrated mitigations at each layer of model development to ensure safety and has open-sourced several safeguards to identify and guard against potentially harmful inputs and outputs.

Hasty Briefsbeta

The Llama 4 Herd