Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)

a year ago

Cerebras sets a world record for LLM inference speed with over 2,500 tokens per second (TPS) on the 400B parameter Llama 4 Maverick model.
Cerebras outperforms NVIDIA Blackwell, which achieved 1,038 TPS, more than doubling NVIDIA's performance.
Independent benchmark firm Artificial Analysis confirmed Cerebras' results, with other vendors like SambaNova, Amazon, Groq, Google, and Microsoft Azure lagging behind.
Cerebras CEO Andrew Feldman highlights the importance of inference speed for enterprise AI applications like agents, code generation, and complex reasoning.
Cerebras' hardware and API are currently available, unlike NVIDIA's custom software optimizations, which are not accessible to most users.
Cerebras' performance is achieved without special kernel optimizations and will be available through Meta’s upcoming API service.
Speed is critical for AI applications like reasoning, voice, and agentic workflows, as slower responses can drive customers to competitors.
Cerebras positions itself as the best choice for developers and enterprise AI users globally due to its record-breaking performance.

Hasty Briefsbeta