Xiaomi unveils open-source AI reasoning model MiMo
a year ago
- #reasoning
- #language-models
- #machine-learning
- MiMo-7B is a series of models designed for reasoning tasks, showing superior performance even compared to larger 32B models.
- The models are trained with a focus on both pre-training and post-training strategies to enhance reasoning capabilities.
- Pre-training optimizations include enhanced data preprocessing, multi-dimensional data filtering, and a three-stage data mixture strategy.
- Post-training involves curated RL training data with rule-based accuracy rewards and a test difficulty driven code reward to mitigate sparse rewards.
- A Seamless Rollout Engine is developed to accelerate RL training, achieving significant speed improvements.
- MiMo-7B series includes base, SFT, and RL models, with the RL model matching the performance of OpenAI o1-mini.
- Benchmark results show MiMo-7B-RL excels in mathematics and code reasoning tasks.
- The models are open-sourced and available on Hugging Face, with support for inference using a fork of vLLM.