Hasty Briefsbeta

Bilingual

LFM2.5-8B-A1B: An Even Better On-Device Mixture-of-Experts

a day ago
  • #Edge Computing
  • #Tool Calling
  • #AI Model Release
  • LFM2.5-8B-A1B is an edge model released for fast, reliable tool calling on consumer hardware, building on LFM2-8B-A1B with a 128K context window, 38T token pretraining, and reinforcement learning.
  • Key features include on-device personal assistant capabilities, compressed performance competitive with larger models, and unmatched throughput, with day-one support for llama.cpp, MLX, vLLM, and SGLang.
  • Improvements over the predecessor include expanded vocabulary to 128K for better non-Latin language tokenization, reasoning-only design with explicit chain of thought, and reduced hallucinations via targeted RL stages.
  • Training highlights involve tokenizer expansion through BPE merge training, context extension to 128K via RoPE adjustments, and mitigation of doom loops and hallucinations with preference optimization and avg@k-based rewards.
  • The model benchmarks competitively in knowledge, instruction following, math, and agentic workflows, with low hallucination rates and high efficiency on both CPU and GPU inference across various platforms.
  • Supported inference ecosystems include LEAP, llama.cpp, MLX, vLLM, SGLang, and ONNX, enabling fast deployment on devices from laptops to phones, with examples like LocalCowork demo showcasing interactive tool-dispatch loops.
  • LFM2.5-8B-A1B is open-weight, fast from day one, and part of a complete model family, aiming to power on-device, private AI agents without data leaving the device.