MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
5 hours ago
- #Ultra-Low-Latency Inference
- #AI Speed Breakthrough
- #Model-System Codesign
- Xiaomi releases MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, achieving over 1000 tokens/s decode speed for a 1-trillion-parameter model for the first time.
- The UltraSpeed API is available at a promotional price from June 9 to June 23, 2026, via application-based limited-time access for approved users, with priority for enterprises and professional developers.
- High-speed inference enables new AI paradigms: parallel reasoning paths for improved quality, enhanced coding agent productivity, and real-time decision-making in critical applications like trading, fraud detection, and medical analysis.
- Key innovations include FP4 quantization for MoE Experts to reduce model size and memory overhead, and DFlash speculative decoding for block-level parallel prediction, increasing acceptance rates, especially in coding and reasoning scenarios.
- TileRT's execution model eliminates microsecond-level execution gaps, optimizing hardware performance through custom compilation and compute kernels tailored to the model's algorithmic characteristics.