Hasty Briefsbeta

Bilingual

Smollm3: Smol, multilingual, long-context reasoner LLM

10 months ago
  • #AI
  • #Language Models
  • #Machine Learning
  • SmolLM3 is a competitive, fully open 3B model that outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B models.
  • The model supports multilingual capabilities in 6 languages (English, French, Spanish, German, Italian, Portuguese) and long context up to 128k with NoPE and YaRN.
  • SmolLM3 features a dual-mode reasoning system with /think and /no_think modes, allowing users to toggle between reasoning and non-reasoning outputs.
  • Training involved a three-stage pretraining approach with evolving data mixtures (web, code, math) totaling 11.2T tokens, optimized for efficiency and performance.
  • The model architecture includes Grouped Query Attention (GQA), NoPE for long-context performance, and intra-document masking for stable training.
  • Post-training enhancements include supervised fine-tuning (SFT) for reasoning and non-reasoning modes, alignment with Anchored Preference Optimization (APO), and model merging to recover long-context performance.
  • Evaluation shows SmolLM3 outperforms other 3B models in knowledge, reasoning, math, and coding benchmarks, with strong multilingual and long-context capabilities.
  • The model supports tool calling and agentic usage, with detailed instructions provided for local deployment and mode switching.
  • Full training recipes, datasets, and configs are released to enable community reproduction and improvement.