CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU
a year ago
- #High Performance Computing
- #Rust
- #GPU Programming
- CubeCL allows GPU programming in Rust with zero-cost abstractions for efficient compute kernels.
- Supports functions, generics, structs, and partial support for traits and methods.
- Kernels are annotated with the `#[cube]` attribute for GPU execution.
- Example provided for GELU (Gaussian Error Linear Unit) computation on GPU.
- Supports multiple GPU runtimes: WGPU (cross-platform), CUDA (NVIDIA), ROCm/HIP (AMD - WIP).
- Plans for a JIT CPU runtime with SIMD using Cranelift.
- Automatic vectorization, comptime optimizations, and autotuning for performance.
- Memory management optimized for throughput with buffer reuse.
- Includes linear algebra components like optimized matrix multiplication.
- Future plans include convolutions, random number generation, and FFTs.
- Two-step compilation process: parsing with `syn` crate and IR generation.
- Topology based on cuboids, mapping to hardware with 3D representations.
- Comparison of CubeCL variables with CUDA, WebGPU, and Metal equivalents.
- Comptime feature enables runtime IR modifications for optimizations.
- Autotuning benchmarks kernels at runtime for optimal performance.
- CubeCL is currently in alpha, used in the Burn project, with ongoing refinements.