GPT-OSS 120B Runs at 3000 tokens/sec on Cerebras
15 days ago
- #AI
- #Machine Learning
- #OpenAI
- OpenAI's GPT OSS 120B model is now available on Cerebras, offering open-weight reasoning with high accuracy.
- The model runs at up to 3,000 tokens per second on Cerebras Inference Cloud, significantly faster than GPUs.
- GPT OSS 120B excels in chain-of-thought tasks, coding, mathematical reasoning, and health-related queries.
- Cerebras provides launch-day support, with speeds 15x faster than leading GPU clouds and low latency.
- The model is cost-effective, offering 16x the speed of median GPU clouds for less than twice the cost.
- GPT OSS 120B is the most capable U.S.-trained open-weight reasoning model available today.
- Available on Cerebras Cloud and partners like HuggingFace, OpenRouter, and Vercel.