VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

6 hours ago

Introduction of VibeThinker-3B, a 3B-parameter dense model for exploring verifiable reasoning in small language models.
Utilizes an optimized pipeline including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation.
Achieves frontier-level performance on verifiable tasks: 94.3 on AIME26 (97.1 with scaling), 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on unseen LeetCode contests.
Matches or exceeds larger flagship models like DeepSeek V3.2, GLM-5, and Gemini 3 Pro while maintaining instruction controllability (93.4 on IFEval).
Proposes the Parametric Compression-Coverage Hypothesis, suggesting compact models can achieve high performance in reasoning through compressed cores, complementing parameter-dense regimes.

Hasty Briefsbeta