CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU

a year ago

CubeCL allows GPU programming in Rust with zero-cost abstractions for efficient compute kernels.
Supports functions, generics, structs, and partial support for traits and methods.
Kernels are annotated with the `#[cube]` attribute for GPU execution.
Example provided for GELU (Gaussian Error Linear Unit) computation on GPU.
Supports multiple GPU runtimes: WGPU (cross-platform), CUDA (NVIDIA), ROCm/HIP (AMD - WIP).
Plans for a JIT CPU runtime with SIMD using Cranelift.
Automatic vectorization, comptime optimizations, and autotuning for performance.
Memory management optimized for throughput with buffer reuse.
Includes linear algebra components like optimized matrix multiplication.
Future plans include convolutions, random number generation, and FFTs.
Two-step compilation process: parsing with `syn` crate and IR generation.
Topology based on cuboids, mapping to hardware with 3D representations.
Comparison of CubeCL variables with CUDA, WebGPU, and Metal equivalents.
Comptime feature enables runtime IR modifications for optimizations.
Autotuning benchmarks kernels at runtime for optimal performance.
CubeCL is currently in alpha, used in the Burn project, with ongoing refinements.

Hasty Briefsbeta