Hasty Briefsbeta

Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks

23 days ago
  • #CUDA
  • #GNN
  • #Machine Learning
  • Batmobile introduces custom CUDA kernels to speed up spherical harmonics and tensor product operations in equivariant GNNs like MACE, NequIP, and Allegro.
  • Equivariant GNNs respect physical symmetries (rotation, translation, reflection) but are computationally expensive, making real-world applications impractical.
  • Spherical harmonics encode 3D directions, while tensor products combine features while preserving equivariance.
  • The standard library e3nn is slow due to Python/PyTorch overhead, memory bandwidth waste, lack of fusion, and dynamic shapes.
  • Batmobile optimizes performance by using compile-time constants, register-only intermediates, and fused operations.
  • Benchmarks show Batmobile is 10-20x faster than e3nn for spherical harmonics and tensor products.
  • Batmobile is specialized for L_max=3, with all 34 Clebsch-Gordan paths unrolled and coefficients as compile-time constants.
  • The project is named Batmobile to reflect its specialized, high-performance nature for molecular simulations.
  • Available on GitHub with benchmarks and examples for quick integration.