Cerebras now supports OpenAI GPT-OSS-120B at 3k Tokens Per SEC
9 months ago
- #Cerebras
- #AI
- #OpenAI
- Cerebras Systems announced inference support for OpenAI's gpt-oss-120B model, achieving record-breaking speeds of 3,000 tokens per second.
- The gpt-oss-120B model offers performance comparable to top proprietary models like Gemini 2.5 Flash and Claude Opus 4, with added speed, cost efficiency, and openness.
- Cerebras' wafer-scale AI infrastructure eliminates GPU bottlenecks, enabling full-model inference at unprecedented speeds.
- Developers can easily switch to Cerebras-powered gpt-oss-120B with no refactoring, gaining instant access to high-performance AI.
- OpenAI's Apache 2.0 license allows users to fine-tune, deploy on-prem, or move across clouds freely.
- Cerebras Cloud offers free API access to gpt-oss-120B, enabling live coding assistants, document Q&A, and fast research chains.