GPT-OSS 120B Runs at 3000 tokens/sec on Cerebras

7 months ago

OpenAI's GPT OSS 120B model is now available on Cerebras, offering open-weight reasoning with high accuracy.
The model runs at up to 3,000 tokens per second on Cerebras Inference Cloud, significantly faster than GPUs.
GPT OSS 120B excels in chain-of-thought tasks, coding, mathematical reasoning, and health-related queries.
Cerebras provides launch-day support, with speeds 15x faster than leading GPU clouds and low latency.
The model is cost-effective, offering 16x the speed of median GPU clouds for less than twice the cost.
GPT OSS 120B is the most capable U.S.-trained open-weight reasoning model available today.
Available on Cerebras Cloud and partners like HuggingFace, OpenRouter, and Vercel.

Hasty Briefsbeta