LFM2-24B-A2B: Scaling Up the LFM2 Architecture
2 days ago
- #MoE Architecture
- #Edge AI
- #AI Models
- Release of LFM2-24B-A2B, a 24B total parameter sparse Mixture of Experts (MoE) model with 2B active parameters per token, marking the largest in the LFM2 family.
- The LFM2 architecture scales from 350M to 24B, showing consistent quality gains on benchmarks and is designed to run on 32GB RAM for cloud and edge deployment.
- Open-weight availability on Hugging Face with support for local execution, fine-tuning, and a playground for testing.
- Scaling strategy involves deeper layers (40 vs. 24), more experts (64 vs. 32), and a lean active path, keeping per-token compute low while expanding total parameters.
- Benchmarks (e.g., GPQA Diamond, MMLU-Pro) show log-linear quality improvements across the LFM2 family, confirming predictable scaling without size limitations.
- Inference support via llama.cpp, vLLM, and SGLang with multiple quantization options, outperforming similar MoE models in throughput tests on CPUs, GPUs, and NPUs.
- Ongoing pre-training beyond 17T tokens, with plans for an enhanced LFM2.5-24B-A2B version post-training and reinforcement learning.