VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO
4 hours ago
- #Parametric Compression
- #Small Language Models
- #Verifiable Reasoning
- Introduction of VibeThinker-3B, a 3B-parameter dense model for exploring verifiable reasoning in small language models.
- Utilizes an optimized pipeline including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation.
- Achieves frontier-level performance on verifiable tasks: 94.3 on AIME26 (97.1 with scaling), 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on unseen LeetCode contests.
- Matches or exceeds larger flagship models like DeepSeek V3.2, GLM-5, and Gemini 3 Pro while maintaining instruction controllability (93.4 on IFEval).
- Proposes the Parametric Compression-Coverage Hypothesis, suggesting compact models can achieve high performance in reasoning through compressed cores, complementing parameter-dense regimes.