Hasty Briefsbeta

Bilingual

LFM2-24B-A2B: Scaling Up the LFM2 Architecture

2 days ago
  • #MoE Architecture
  • #Edge AI
  • #AI Models
  • Release of LFM2-24B-A2B, a 24B total parameter sparse Mixture of Experts (MoE) model with 2B active parameters per token, marking the largest in the LFM2 family.
  • The LFM2 architecture scales from 350M to 24B, showing consistent quality gains on benchmarks and is designed to run on 32GB RAM for cloud and edge deployment.
  • Open-weight availability on Hugging Face with support for local execution, fine-tuning, and a playground for testing.
  • Scaling strategy involves deeper layers (40 vs. 24), more experts (64 vs. 32), and a lean active path, keeping per-token compute low while expanding total parameters.
  • Benchmarks (e.g., GPQA Diamond, MMLU-Pro) show log-linear quality improvements across the LFM2 family, confirming predictable scaling without size limitations.
  • Inference support via llama.cpp, vLLM, and SGLang with multiple quantization options, outperforming similar MoE models in throughput tests on CPUs, GPUs, and NPUs.
  • Ongoing pre-training beyond 17T tokens, with plans for an enhanced LFM2.5-24B-A2B version post-training and reinforcement learning.