Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)
a year ago
- #AI
- #Inference
- #LLM
- Cerebras sets a world record for LLM inference speed with over 2,500 tokens per second (TPS) on the 400B parameter Llama 4 Maverick model.
- Cerebras outperforms NVIDIA Blackwell, which achieved 1,038 TPS, more than doubling NVIDIA's performance.
- Independent benchmark firm Artificial Analysis confirmed Cerebras' results, with other vendors like SambaNova, Amazon, Groq, Google, and Microsoft Azure lagging behind.
- Cerebras CEO Andrew Feldman highlights the importance of inference speed for enterprise AI applications like agents, code generation, and complex reasoning.
- Cerebras' hardware and API are currently available, unlike NVIDIA's custom software optimizations, which are not accessible to most users.
- Cerebras' performance is achieved without special kernel optimizations and will be available through Meta’s upcoming API service.
- Speed is critical for AI applications like reasoning, voice, and agentic workflows, as slower responses can drive customers to competitors.
- Cerebras positions itself as the best choice for developers and enterprise AI users globally due to its record-breaking performance.