R-Zero: Self-Evolving Reasoning LLM from Zero Data

8 months ago

Introduces R-Zero, a fully autonomous framework for self-evolving Large Language Models (LLMs).
R-Zero generates its own training data from scratch without relying on human-curated tasks or labels.
Utilizes two independent models, a Challenger and a Solver, which co-evolve through interaction.
The Challenger proposes tasks near the edge of the Solver's capability, while the Solver is rewarded for solving increasingly challenging tasks.
Empirically improves reasoning capability, boosting performance on math-reasoning and general-domain reasoning benchmarks.
Demonstrates scalability and potential for advancing AI systems beyond human intelligence.

Hasty Briefsbeta