R-Zero: Self-Evolving Reasoning LLM from Zero Data
14 hours ago
- #Machine Learning
- #Autonomous Learning
- #Large Language Models
- Introduces R-Zero, a fully autonomous framework for self-evolving Large Language Models (LLMs).
- R-Zero generates its own training data from scratch without relying on human-curated tasks or labels.
- Utilizes two independent models, a Challenger and a Solver, which co-evolve through interaction.
- The Challenger proposes tasks near the edge of the Solver's capability, while the Solver is rewarded for solving increasingly challenging tasks.
- Empirically improves reasoning capability, boosting performance on math-reasoning and general-domain reasoning benchmarks.
- Demonstrates scalability and potential for advancing AI systems beyond human intelligence.