Large-Scale Agentic RL for CUDA Kernel Generation

6 hours ago

CUDA Agent is a large-scale agentic reinforcement learning system for CUDA kernel optimization.
It achieves state-of-the-art results on KernelBench, outperforming torch.compile with faster rates across all levels.
The system includes scalable data synthesis, a skill-augmented CUDA development environment, and stable long-context training techniques.
CUDA-Agent-Ops-6K, a high-quality synthesized training dataset, has been released to support reproducible research.
The training pipeline involves single-turn PPO warm-up, actor and critic initialization, and multi-turn agentic RL for stability.
CUDA Agent's workflow includes iterative coding, compile-debug cycles, and profiler-guided optimization with robust reward schedules.
The system's performance is measured by Pass Rate, Faster Rate, and Geomean Speed-up on KernelBench splits.

Hasty Briefsbeta