Hasty Briefsbeta

Bilingual

Gemma 4 on Cerebras - The Fastest Inference Is Now Multimodal

8 hours ago
  • #Multimodal AI
  • #Open-Weight Models
  • #Fast Inference
  • Gemma 4 31B runs at over 1,800 tokens per second on Cerebras Inference, making it the fastest multimodal model for applications like computer use and image-driven workflows.
  • The Cerebras platform offers record speed (1,851 output tokens/sec) and low latency (1.5 seconds for first token), enabling real-time use and outperforming typical GPU endpoints and models like Claude Haiku.
  • Gemma 4 31B is an open-weight model under Apache 2.0, comparable in intelligence to Claude Haiku 4.5, and serves as a reference medium-size model for alternatives to Haiku, GPT-OSS, or Llama.
  • It supports image understanding (e.g., screenshots, charts, UI states), unlocking new product experiences like real-time insight generation, long-context summarization, and UI patching.
  • Available now on the Cerebras Inference Cloud in public preview for workloads requiring multimodal reasoning, fast document processing, or real-time audio/video.