Tiny Model, Big Logic: Large-Model Reasoning Ability in VibeThinker-1.5B

6 months ago

Introduction of VibeThinker-1.5B, a 1.5B-parameter dense model challenging the notion that small models lack robust reasoning.
Development via the Spectrum-to-Signal Principle (SSP), involving Two-Stage Diversity-Exploring Distillation and MaxEnt-Guided Policy Optimization.
Achieves superior reasoning capabilities compared to larger models like DeepSeek R1 and Magistral Medium, with a training cost of only $7,800.
Outperforms the 400x larger DeepSeek R1 on math benchmarks: AIME24, AIME25, and HMMT25.
Scores 51.1 on LiveCodeBench V6, surpassing Magistral Medium's 50.3 and its base model's 0.0.
Demonstrates that small models can match large models' reasoning, reducing costs and democratizing AI research.

Hasty Briefsbeta