Absolute Zero Reasoner

a year ago

The Absolute Zero paradigm eliminates dependency on human-curated data by enabling models to propose tasks, solve them, and learn autonomously through self-play.
AZR (Absolute Zero Reasoner) is the first implementation of this paradigm, using a unified language model for both proposing and solving code-based reasoning challenges.
AZR operates across three reasoning modes: Deduction (predicting outputs), Abduction (inferring inputs), and Induction (synthesizing programs).
Results show AZR improves performance across different model sizes (3B to 14B), with larger models showing greater gains.
AZR demonstrates strong cross-domain transfer, with coding capabilities amplifying reasoning improvements in math tasks.
Distinct cognitive behaviors emerge during AZR training, such as step-by-step reasoning and trial-and-error, varying by task type.
Safety concerns were noted with certain base models, highlighting the need for safety-aware training in future work.

Hasty Briefsbeta