AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
16 hours ago
- #Machine Learning Systems
- #Autonomous Agents
- #GPU Optimization
- AutoKernel is an autonomous agent-based framework for optimizing GPU kernels in PyTorch models automatically.
- It identifies bottlenecks via profiling, ranks improvements using Amdahl's law, and iteratively refines kernel implementations through automated experiments.
- Ensures correctness with a five-stage harness, including smoke tests, shape sweeps, numerical stability, determinism verification, and edge-case handling.
- Supports Triton and CUDA C++ backends, with 18 starter kernels and nine kernel types crucial for modern transformer architectures.
- Outperforms PyTorch eager and this http URL (max-autotume) on an NVIDIA H100, achieving significant speedups like 5.29x on RMSNorm and leading in community benchmarks.
- Open-sourced with extensive code and integration, available online for public access and use.