Fast GPU Linear Algebra via Compile Time Expression Fusion
5 hours ago
- #Linear Algebra
- #GPU Computing
- #Compile-Time Optimization
- Bandicoot is a GPU linear algebra toolkit that prioritizes ease of use without compromising efficiency.
- It has an API compatible with the Armadillo CPU linear algebra library for easy transition from CPU-based code.
- Bandicoot uses template metaprogramming to generate fused GPU kernels at compile time, avoiding runtime overhead or JIT infrastructure.
- The generated kernels are efficient and often able to saturate memory bandwidth.
- Empirical results show Bandicoot outperforms other common linear algebra toolkits like PyTorch, TensorFlow, and JAX.