Smollm3: Smol, multilingual, long-context reasoner LLM
10 months ago
- #AI
- #Language Models
- #Machine Learning
- SmolLM3 is a competitive, fully open 3B model that outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B models.
- The model supports multilingual capabilities in 6 languages (English, French, Spanish, German, Italian, Portuguese) and long context up to 128k with NoPE and YaRN.
- SmolLM3 features a dual-mode reasoning system with /think and /no_think modes, allowing users to toggle between reasoning and non-reasoning outputs.
- Training involved a three-stage pretraining approach with evolving data mixtures (web, code, math) totaling 11.2T tokens, optimized for efficiency and performance.
- The model architecture includes Grouped Query Attention (GQA), NoPE for long-context performance, and intra-document masking for stable training.
- Post-training enhancements include supervised fine-tuning (SFT) for reasoning and non-reasoning modes, alignment with Anchored Preference Optimization (APO), and model merging to recover long-context performance.
- Evaluation shows SmolLM3 outperforms other 3B models in knowledge, reasoning, math, and coding benchmarks, with strong multilingual and long-context capabilities.
- The model supports tool calling and agentic usage, with detailed instructions provided for local deployment and mode switching.
- Full training recipes, datasets, and configs are released to enable community reproduction and improvement.