Cerebras now supports OpenAI GPT-OSS-120B at 3k Tokens Per SEC

9 months ago

Cerebras Systems announced inference support for OpenAI's gpt-oss-120B model, achieving record-breaking speeds of 3,000 tokens per second.
The gpt-oss-120B model offers performance comparable to top proprietary models like Gemini 2.5 Flash and Claude Opus 4, with added speed, cost efficiency, and openness.
Cerebras' wafer-scale AI infrastructure eliminates GPU bottlenecks, enabling full-model inference at unprecedented speeds.
Developers can easily switch to Cerebras-powered gpt-oss-120B with no refactoring, gaining instant access to high-performance AI.
OpenAI's Apache 2.0 license allows users to fine-tune, deploy on-prem, or move across clouds freely.
Cerebras Cloud offers free API access to gpt-oss-120B, enabling live coding assistants, document Q&A, and fast research chains.

Hasty Briefsbeta