Hasty Briefsbeta

Bilingual

Mamba-3

6 hours ago
  • #State Space Models
  • #Inference Efficiency
  • #Mamba-3
  • Mamba-3 is a new state space model (SSM) designed for inference efficiency, unlike Mamba-2 which focused on training speed.
  • Key upgrades include a more expressive recurrence formula, complex-valued state tracking, and a MIMO variant for better accuracy without slowing down decoding.
  • Mamba-3 SISO outperforms Mamba-2, Gated DeltaNet, and Llama-3.2-1B on prefill+decode latency at the 1.5B scale.
  • The team open-sourced kernels built with Triton, TileLang, and CuTe DSL for maximum hardware performance.
  • Mamba-3's design includes a more expressive SSM mechanism, complex-valued SSM system, and MIMO SSMs for improved performance.
  • The architecture changes include the addition of QKNorm, removal of short convolution, and the use of RoPE and MIMO projections.
  • Empirical results show Mamba-3 outperforms Mamba-2 and other linear alternatives on language modeling and retrieval tasks.
  • Mamba-3's kernels are optimized for speed, with prefill and decode latencies outperforming competitors.
  • The design choices for the kernels include using Triton, TileLang, and CuTe DSL for different components to balance performance and ease-of-use.
  • Future directions include further exploration of Mamba-3's core improvements and their SSM foundations.