M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

a year ago

Introduces M1, a hybrid linear RNN reasoning model based on the Mamba architecture for memory-efficient inference.
Leverages distillation from existing reasoning models and RL training to enhance performance.
Outperforms previous linear RNN models and matches state-of-the-art Deepseek R1 distilled reasoning models on AIME and MATH benchmarks.
Achieves more than 3x speedup compared to same-size transformers when using vLLM, enabling higher accuracy under fixed generation time budgets.
Proposes an effective approach to scaling test-time generation using self-consistency or long chain-of-thought reasoning.

Hasty Briefsbeta