Absolute Zero Reasoner
a year ago
- #AI
- #Machine Learning
- #Autonomous Reasoning
- The Absolute Zero paradigm eliminates dependency on human-curated data by enabling models to propose tasks, solve them, and learn autonomously through self-play.
- AZR (Absolute Zero Reasoner) is the first implementation of this paradigm, using a unified language model for both proposing and solving code-based reasoning challenges.
- AZR operates across three reasoning modes: Deduction (predicting outputs), Abduction (inferring inputs), and Induction (synthesizing programs).
- Results show AZR improves performance across different model sizes (3B to 14B), with larger models showing greater gains.
- AZR demonstrates strong cross-domain transfer, with coding capabilities amplifying reasoning improvements in math tasks.
- Distinct cognitive behaviors emerge during AZR training, such as step-by-step reasoning and trial-and-error, varying by task type.
- Safety concerns were noted with certain base models, highlighting the need for safety-aware training in future work.