Mamba-3
6 hours ago
- #State Space Models
- #Inference Efficiency
- #Mamba-3
- Mamba-3 is a new state space model (SSM) designed for inference efficiency, unlike Mamba-2 which focused on training speed.
- Key upgrades include a more expressive recurrence formula, complex-valued state tracking, and a MIMO variant for better accuracy without slowing down decoding.
- Mamba-3 SISO outperforms Mamba-2, Gated DeltaNet, and Llama-3.2-1B on prefill+decode latency at the 1.5B scale.
- The team open-sourced kernels built with Triton, TileLang, and CuTe DSL for maximum hardware performance.
- Mamba-3's design includes a more expressive SSM mechanism, complex-valued SSM system, and MIMO SSMs for improved performance.
- The architecture changes include the addition of QKNorm, removal of short convolution, and the use of RoPE and MIMO projections.
- Empirical results show Mamba-3 outperforms Mamba-2 and other linear alternatives on language modeling and retrieval tasks.
- Mamba-3's kernels are optimized for speed, with prefill and decode latencies outperforming competitors.
- The design choices for the kernels include using Triton, TileLang, and CuTe DSL for different components to balance performance and ease-of-use.
- Future directions include further exploration of Mamba-3's core improvements and their SSM foundations.