Hasty Briefsbeta

Bilingual

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

17 hours ago
  • #Machine Learning Systems
  • #Autonomous Agents
  • #GPU Optimization
  • AutoKernel is an autonomous agent-based framework for optimizing GPU kernels in PyTorch models automatically.
  • It identifies bottlenecks via profiling, ranks improvements using Amdahl's law, and iteratively refines kernel implementations through automated experiments.
  • Ensures correctness with a five-stage harness, including smoke tests, shape sweeps, numerical stability, determinism verification, and edge-case handling.
  • Supports Triton and CUDA C++ backends, with 18 starter kernels and nine kernel types crucial for modern transformer architectures.
  • Outperforms PyTorch eager and this http URL (max-autotume) on an NVIDIA H100, achieving significant speedups like 5.29x on RMSNorm and leading in community benchmarks.
  • Open-sourced with extensive code and integration, available online for public access and use.