Hasty Briefsbeta

Bilingual

Bamba: An open-source LLM that crosses a transformer with an SSM

a year ago
  • #AI
  • #LLM
  • #Transformer
  • IBM Research, in collaboration with CMU, Princeton, and University of Illinois, developed Bamba, an open-source LLM combining transformer expressiveness with SSM speed.
  • Transformers face a 'quadratic bottleneck' where longer conversations increase computational costs quadratically, causing latency and redundant computing.
  • State-space models (SSMs) maintain a compressed hidden state, reducing memory overhead and enabling faster inference speeds compared to transformers.
  • Bamba-9B reduces KV cache memory requirements, running twice as fast as similar-sized transformers while maintaining accuracy.
  • SSMs, traditionally used in electrical engineering and time-series data analysis, were adapted for deep learning by IBM researchers.
  • Mamba2, a gated SSM variant, inspired hybrids like Samba and MambaFormer, leading to Nvidia's Nemotron-H.
  • IBM trained Bamba on 3 trillion tokens, quantized it to 8-bit precision, and achieved performance comparable to Meta's Llama-3.1 8B.
  • Bamba can handle 32,000-token conversations and may scale to 1 million tokens with vLLM support, potentially running five times faster than transformers.