Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks
23 days ago
- #CUDA
- #GNN
- #Machine Learning
- Batmobile introduces custom CUDA kernels to speed up spherical harmonics and tensor product operations in equivariant GNNs like MACE, NequIP, and Allegro.
- Equivariant GNNs respect physical symmetries (rotation, translation, reflection) but are computationally expensive, making real-world applications impractical.
- Spherical harmonics encode 3D directions, while tensor products combine features while preserving equivariance.
- The standard library e3nn is slow due to Python/PyTorch overhead, memory bandwidth waste, lack of fusion, and dynamic shapes.
- Batmobile optimizes performance by using compile-time constants, register-only intermediates, and fused operations.
- Benchmarks show Batmobile is 10-20x faster than e3nn for spherical harmonics and tensor products.
- Batmobile is specialized for L_max=3, with all 34 Clebsch-Gordan paths unrolled and coefficients as compile-time constants.
- The project is named Batmobile to reflect its specialized, high-performance nature for molecular simulations.
- Available on GitHub with benchmarks and examples for quick integration.