Hasty Briefsbeta

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

3 days ago
  • #Machine Learning
  • #Reinforcement Learning
  • #Meta-Learning
  • Investigates whether a pretrained LLM can generate an automated curriculum for problems it cannot solve.
  • Introduces SOAR, a self-improvement framework using meta-RL where a teacher model proposes synthetic problems for a student model.
  • SOAR grounds the curriculum in measured student progress rather than intrinsic proxy rewards.
  • Study conducted on the hardest subsets of mathematical benchmarks (0/128 success rate).
  • Key findings include the feasibility of bi-level meta-RL under sparse, binary rewards.
  • Grounded rewards outperform intrinsic reward schemes, avoiding instability and diversity collapse.
  • Structural quality and well-posedness of generated questions are more critical for learning progress than solution correctness.
  • Suggests that generating useful stepping stones does not require the ability to solve hard problems initially.