Hasty Briefsbeta

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

5 hours ago
  • #Mathematical Olympiad
  • #Large Language Models
  • #Artificial Intelligence
  • The paper presents a verification-and-refinement pipeline for solving IMO-level math problems using large language models.
  • The pipeline significantly improves performance, achieving 85.7% accuracy on IMO 2025 problems compared to baseline accuracies of 31.6% (Gemini 2.5 Pro), 21.4% (Grok-4), and 38.1% (GPT-5).
  • The approach is model-agnostic, demonstrating effectiveness with three leading models: Gemini 2.5 Pro, Grok-4, and GPT-5.
  • The study highlights the importance of methodologies to harness base models' potential for complex reasoning tasks beyond just improving model capabilities.