MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

5 hours ago

Xiaomi releases MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed for a 1-trillion-parameter model for the first time.
The UltraSpeed API is available at a promotional price from June 9 to June 23, 2026, via application-based limited-time access for approved users, with priority for enterprises and professional developers.
High-speed inference enables new AI paradigms: parallel reasoning paths for improved quality, enhanced coding agent productivity, and real-time decision-making in critical applications like trading, fraud detection, and medical analysis.
Key innovations include FP4 quantization for MoE Experts to reduce model size and memory overhead, and DFlash speculative decoding for block-level parallel prediction, increasing acceptance rates, especially in coding and reasoning scenarios.
TileRT's execution model eliminates microsecond-level execution gaps, optimizing hardware performance through custom compilation and compute kernels tailored to the model's algorithmic characteristics.

Hasty Briefsbeta