Mamba-3

6 hours ago

Mamba-3 is a new state space model (SSM) designed for inference efficiency, unlike Mamba-2 which focused on training speed.
Key upgrades include a more expressive recurrence formula, complex-valued state tracking, and a MIMO variant for better accuracy without slowing down decoding.
Mamba-3 SISO outperforms Mamba-2, Gated DeltaNet, and Llama-3.2-1B on prefill+decode latency at the 1.5B scale.
The team open-sourced kernels built with Triton, TileLang, and CuTe DSL for maximum hardware performance.
Mamba-3's design includes a more expressive SSM mechanism, complex-valued SSM system, and MIMO SSMs for improved performance.
The architecture changes include the addition of QKNorm, removal of short convolution, and the use of RoPE and MIMO projections.
Empirical results show Mamba-3 outperforms Mamba-2 and other linear alternatives on language modeling and retrieval tasks.
Mamba-3's kernels are optimized for speed, with prefill and decode latencies outperforming competitors.
The design choices for the kernels include using Triton, TileLang, and CuTe DSL for different components to balance performance and ease-of-use.
Future directions include further exploration of Mamba-3's core improvements and their SSM foundations.

Hasty Briefsbeta