Hasty Briefsbeta

Bilingual

Fast GPU Linear Algebra via Compile Time Expression Fusion

5 hours ago
  • #Linear Algebra
  • #GPU Computing
  • #Compile-Time Optimization
  • Bandicoot is a GPU linear algebra toolkit that prioritizes ease of use without compromising efficiency.
  • It has an API compatible with the Armadillo CPU linear algebra library for easy transition from CPU-based code.
  • Bandicoot uses template metaprogramming to generate fused GPU kernels at compile time, avoiding runtime overhead or JIT infrastructure.
  • The generated kernels are efficient and often able to saturate memory bandwidth.
  • Empirical results show Bandicoot outperforms other common linear algebra toolkits like PyTorch, TensorFlow, and JAX.