Hasty Briefsbeta

Bilingual

Solving Physics Olympiad via reinforcement learning on physics simulators

2 days ago
  • #synthetic-data
  • #physics-simulators
  • #LLM-training
  • LLM reasoning advancements are limited by scarce internet QA pairs, especially in sciences like physics.
  • Physics simulators serve as scalable supervision sources, generating synthetic QA data to train LLMs for physical reasoning.
  • Using domain-specific languages (DSL) to randomize scene graphs ensures controlled, valid, and diverse physical variations.
  • Synthetic question-answer pairs are auto-generated from simulations via templates, covering numeric, reverse, and symbolic types.
  • Reinforcement learning on synthetic data enables zero-shot sim-to-real transfer, boosting performance on real-world physics benchmarks.
  • Training on synthetic data improves IPhO problem performance by up to 7 percentage points across model sizes.
  • Performance gains generalize to other physics and math benchmarks, showing meaningful skill transfer beyond simulator scope.
  • Simulator-based benchmarks are fast, cheap, and scalable, correlating well with real-world reasoning evaluation.
  • Fine-tuning reduces arithmetic errors and improves equation selection based on physical contexts rather than rote application.