Smollm3: Smol, multilingual, long-context reasoner LLM

a year ago

SmolLM3 is a competitive, fully open 3B model that outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B models.
The model supports multilingual capabilities in 6 languages (English, French, Spanish, German, Italian, Portuguese) and long context up to 128k with NoPE and YaRN.
SmolLM3 features a dual-mode reasoning system with /think and /no_think modes, allowing users to toggle between reasoning and non-reasoning outputs.
Training involved a three-stage pretraining approach with evolving data mixtures (web, code, math) totaling 11.2T tokens, optimized for efficiency and performance.
The model architecture includes Grouped Query Attention (GQA), NoPE for long-context performance, and intra-document masking for stable training.
Post-training enhancements include supervised fine-tuning (SFT) for reasoning and non-reasoning modes, alignment with Anchored Preference Optimization (APO), and model merging to recover long-context performance.
Evaluation shows SmolLM3 outperforms other 3B models in knowledge, reasoning, math, and coding benchmarks, with strong multilingual and long-context capabilities.
The model supports tool calling and agentic usage, with detailed instructions provided for local deployment and mode switching.
Full training recipes, datasets, and configs are released to enable community reproduction and improvement.

Hasty Briefsbeta