Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks

4 months ago

Batmobile introduces custom CUDA kernels to speed up spherical harmonics and tensor product operations in equivariant GNNs like MACE, NequIP, and Allegro.
Equivariant GNNs respect physical symmetries (rotation, translation, reflection) but are computationally expensive, making real-world applications impractical.
Spherical harmonics encode 3D directions, while tensor products combine features while preserving equivariance.
The standard library e3nn is slow due to Python/PyTorch overhead, memory bandwidth waste, lack of fusion, and dynamic shapes.
Batmobile optimizes performance by using compile-time constants, register-only intermediates, and fused operations.
Benchmarks show Batmobile is 10-20x faster than e3nn for spherical harmonics and tensor products.
Batmobile is specialized for L_max=3, with all 34 Clebsch-Gordan paths unrolled and coefficients as compile-time constants.
The project is named Batmobile to reflect its specialized, high-performance nature for molecular simulations.
Available on GitHub with benchmarks and examples for quick integration.

Hasty Briefsbeta