Hasty Briefsbeta

GPT-OSS 120B Runs at 3000 tokens/sec on Cerebras

15 days ago
  • #AI
  • #Machine Learning
  • #OpenAI
  • OpenAI's GPT OSS 120B model is now available on Cerebras, offering open-weight reasoning with high accuracy.
  • The model runs at up to 3,000 tokens per second on Cerebras Inference Cloud, significantly faster than GPUs.
  • GPT OSS 120B excels in chain-of-thought tasks, coding, mathematical reasoning, and health-related queries.
  • Cerebras provides launch-day support, with speeds 15x faster than leading GPU clouds and low latency.
  • The model is cost-effective, offering 16x the speed of median GPU clouds for less than twice the cost.
  • GPT OSS 120B is the most capable U.S.-trained open-weight reasoning model available today.
  • Available on Cerebras Cloud and partners like HuggingFace, OpenRouter, and Vercel.