Hasty Briefsbeta

Bilingual

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

7 hours ago
  • #Ultra-Low-Latency Inference
  • #AI Speed Breakthrough
  • #Model-System Codesign
  • Xiaomi releases MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed for a 1-trillion-parameter model for the first time.
  • The UltraSpeed API is available at a promotional price from June 9 to June 23, 2026, via application-based limited-time access for approved users, with priority for enterprises and professional developers.
  • High-speed inference enables new AI paradigms: parallel reasoning paths for improved quality, enhanced coding agent productivity, and real-time decision-making in critical applications like trading, fraud detection, and medical analysis.
  • Key innovations include FP4 quantization for MoE Experts to reduce model size and memory overhead, and DFlash speculative decoding for block-level parallel prediction, increasing acceptance rates, especially in coding and reasoning scenarios.
  • TileRT's execution model eliminates microsecond-level execution gaps, optimizing hardware performance through custom compilation and compute kernels tailored to the model's algorithmic characteristics.