Solving Physics Olympiad via reinforcement learning on physics simulators

2 days ago

LLM reasoning advancements are limited by scarce internet QA pairs, especially in sciences like physics.
Physics simulators serve as scalable supervision sources, generating synthetic QA data to train LLMs for physical reasoning.
Using domain-specific languages (DSL) to randomize scene graphs ensures controlled, valid, and diverse physical variations.
Synthetic question-answer pairs are auto-generated from simulations via templates, covering numeric, reverse, and symbolic types.
Reinforcement learning on synthetic data enables zero-shot sim-to-real transfer, boosting performance on real-world physics benchmarks.
Training on synthetic data improves IPhO problem performance by up to 7 percentage points across model sizes.
Performance gains generalize to other physics and math benchmarks, showing meaningful skill transfer beyond simulator scope.
Simulator-based benchmarks are fast, cheap, and scalable, correlating well with real-world reasoning evaluation.
Fine-tuning reduces arithmetic errors and improves equation selection based on physical contexts rather than rote application.

Hasty Briefsbeta