Solving Physics Olympiad via reinforcement learning on physics simulators
2 days ago
- #synthetic-data
- #physics-simulators
- #LLM-training
- LLM reasoning advancements are limited by scarce internet QA pairs, especially in sciences like physics.
- Physics simulators serve as scalable supervision sources, generating synthetic QA data to train LLMs for physical reasoning.
- Using domain-specific languages (DSL) to randomize scene graphs ensures controlled, valid, and diverse physical variations.
- Synthetic question-answer pairs are auto-generated from simulations via templates, covering numeric, reverse, and symbolic types.
- Reinforcement learning on synthetic data enables zero-shot sim-to-real transfer, boosting performance on real-world physics benchmarks.
- Training on synthetic data improves IPhO problem performance by up to 7 percentage points across model sizes.
- Performance gains generalize to other physics and math benchmarks, showing meaningful skill transfer beyond simulator scope.
- Simulator-based benchmarks are fast, cheap, and scalable, correlating well with real-world reasoning evaluation.
- Fine-tuning reduces arithmetic errors and improves equation selection based on physical contexts rather than rote application.