Hasty Briefsbeta

Bilingual

Large-Scale Agentic RL for CUDA Kernel Generation

6 hours ago
  • #CUDA optimization
  • #KernelBench
  • #Reinforcement Learning
  • CUDA Agent is a large-scale agentic reinforcement learning system for CUDA kernel optimization.
  • It achieves state-of-the-art results on KernelBench, outperforming torch.compile with faster rates across all levels.
  • The system includes scalable data synthesis, a skill-augmented CUDA development environment, and stable long-context training techniques.
  • CUDA-Agent-Ops-6K, a high-quality synthesized training dataset, has been released to support reproducible research.
  • The training pipeline involves single-turn PPO warm-up, actor and critic initialization, and multi-turn agentic RL for stability.
  • CUDA Agent's workflow includes iterative coding, compile-debug cycles, and profiler-guided optimization with robust reward schedules.
  • The system's performance is measured by Pass Rate, Faster Rate, and Geomean Speed-up on KernelBench splits.