The Llama 4 Herd
a year ago
- #Artificial Intelligence
- #Open Source
- #Multimodal Models
- Meta announces Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context length support and built using a mixture-of-experts (MoE) architecture.
- Llama 4 Scout is a 17 billion active parameter model with 16 experts, fitting on a single H100 GPU, while Llama 4 Maverick is a 17 billion active parameter model with 128 experts, fitting on a single H100 host.
- Llama 4 Behemoth, a teacher model, outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks but is not yet released as it is still training.
- Llama 4 models are designed with native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone.
- Meta developed a new training technique called MetaP to reliably set critical model hyper-parameters and pre-trained on 200 languages, including over 100 with over 1 billion tokens each.
- Llama 4 Maverick offers unparalleled performance in image and text understanding, while Llama 4 Scout dramatically increases the supported context length to 10 million tokens.
- Meta has made improvements in addressing bias in LLMs, with Llama 4 performing significantly better than Llama 3 and comparable to Grok.
- Meta is making Llama 4 Scout and Llama 4 Maverick available for download on llama.com and Hugging Face, with availability across cloud and data platforms to follow shortly.
- Meta is also previewing Llama 4 Behemoth, a teacher model with 288B active parameters and nearly two trillion total parameters, offering state-of-the-art performance for non-reasoning models.
- Meta has integrated mitigations at each layer of model development to ensure safety and has open-sourced several safeguards to identify and guard against potentially harmful inputs and outputs.