Xiaomi unveils open-source AI reasoning model MiMo

a year ago

MiMo-7B is a series of models designed for reasoning tasks, showing superior performance even compared to larger 32B models.
The models are trained with a focus on both pre-training and post-training strategies to enhance reasoning capabilities.
Pre-training optimizations include enhanced data preprocessing, multi-dimensional data filtering, and a three-stage data mixture strategy.
Post-training involves curated RL training data with rule-based accuracy rewards and a test difficulty driven code reward to mitigate sparse rewards.
A Seamless Rollout Engine is developed to accelerate RL training, achieving significant speed improvements.
MiMo-7B series includes base, SFT, and RL models, with the RL model matching the performance of OpenAI o1-mini.
Benchmark results show MiMo-7B-RL excels in mathematics and code reasoning tasks.
The models are open-sourced and available on Hugging Face, with support for inference using a fork of vLLM.

Hasty Briefsbeta