M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
a year ago
- #Mamba Architecture
- #Reasoning Models
- #Machine Learning
- Introduces M1, a hybrid linear RNN reasoning model based on the Mamba architecture for memory-efficient inference.
- Leverages distillation from existing reasoning models and RL training to enhance performance.
- Outperforms previous linear RNN models and matches state-of-the-art Deepseek R1 distilled reasoning models on AIME and MATH benchmarks.
- Achieves more than 3x speedup compared to same-size transformers when using vLLM, enabling higher accuracy under fixed generation time budgets.
- Proposes an effective approach to scaling test-time generation using self-consistency or long chain-of-thought reasoning.