Hasty Briefsbeta

Bilingual

Mercury 2: Diffusion Reasoning Model

8 hours ago
  • #AI
  • #LLM
  • #Diffusion
  • Mercury 2 is introduced as the world's fastest reasoning language model, designed for instant production AI.
  • Speed is critical in production AI due to compounding latency in loops like agents and retrieval pipelines.
  • Mercury 2 uses diffusion-based parallel refinement for faster generation, producing multiple tokens simultaneously.
  • It offers >5x faster generation compared to autoregressive models, changing the reasoning trade-off.
  • Key features include 1,009 tokens/sec speed, competitive quality, 128K context, and native tool use.
  • Optimized for real-time responsiveness with low p95 latency under high concurrency.
  • NVIDIA highlights Mercury 2's performance on their GPUs, surpassing 1,000 tokens/sec.
  • Excels in latency-sensitive applications like coding, agentic loops, real-time voice, and search pipelines.
  • Partners and customers praise its speed, quality, and impact on workflows.
  • Mercury 2 is OpenAI API compatible and available now for enterprise evaluations.