Hasty Briefsbeta

Bilingual

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

6 hours ago
  • #Parametric Compression
  • #Small Language Models
  • #Verifiable Reasoning
  • Introduction of VibeThinker-3B, a 3B-parameter dense model for exploring verifiable reasoning in small language models.
  • Utilizes an optimized pipeline including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation.
  • Achieves frontier-level performance on verifiable tasks: 94.3 on AIME26 (97.1 with scaling), 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on unseen LeetCode contests.
  • Matches or exceeds larger flagship models like DeepSeek V3.2, GLM-5, and Gemini 3 Pro while maintaining instruction controllability (93.4 on IFEval).
  • Proposes the Parametric Compression-Coverage Hypothesis, suggesting compact models can achieve high performance in reasoning through compressed cores, complementing parameter-dense regimes.