Hasty Briefsbeta

Bilingual

Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)

a year ago
  • #AI
  • #Inference
  • #LLM
  • Cerebras sets a world record for LLM inference speed with over 2,500 tokens per second (TPS) on the 400B parameter Llama 4 Maverick model.
  • Cerebras outperforms NVIDIA Blackwell, which achieved 1,038 TPS, more than doubling NVIDIA's performance.
  • Independent benchmark firm Artificial Analysis confirmed Cerebras' results, with other vendors like SambaNova, Amazon, Groq, Google, and Microsoft Azure lagging behind.
  • Cerebras CEO Andrew Feldman highlights the importance of inference speed for enterprise AI applications like agents, code generation, and complex reasoning.
  • Cerebras' hardware and API are currently available, unlike NVIDIA's custom software optimizations, which are not accessible to most users.
  • Cerebras' performance is achieved without special kernel optimizations and will be available through Meta’s upcoming API service.
  • Speed is critical for AI applications like reasoning, voice, and agentic workflows, as slower responses can drive customers to competitors.
  • Cerebras positions itself as the best choice for developers and enterprise AI users globally due to its record-breaking performance.