Large-Scale Agentic RL for CUDA Kernel Generation
6 hours ago
- #CUDA optimization
- #KernelBench
- #Reinforcement Learning
- CUDA Agent is a large-scale agentic reinforcement learning system for CUDA kernel optimization.
- It achieves state-of-the-art results on KernelBench, outperforming torch.compile with faster rates across all levels.
- The system includes scalable data synthesis, a skill-augmented CUDA development environment, and stable long-context training techniques.
- CUDA-Agent-Ops-6K, a high-quality synthesized training dataset, has been released to support reproducible research.
- The training pipeline involves single-turn PPO warm-up, actor and critic initialization, and multi-turn agentic RL for stability.
- CUDA Agent's workflow includes iterative coding, compile-debug cycles, and profiler-guided optimization with robust reward schedules.
- The system's performance is measured by Pass Rate, Faster Rate, and Geomean Speed-up on KernelBench splits.