Hasty Briefsbeta

Bilingual

CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU

a year ago
  • #High Performance Computing
  • #Rust
  • #GPU Programming
  • CubeCL allows GPU programming in Rust with zero-cost abstractions for efficient compute kernels.
  • Supports functions, generics, structs, and partial support for traits and methods.
  • Kernels are annotated with the `#[cube]` attribute for GPU execution.
  • Example provided for GELU (Gaussian Error Linear Unit) computation on GPU.
  • Supports multiple GPU runtimes: WGPU (cross-platform), CUDA (NVIDIA), ROCm/HIP (AMD - WIP).
  • Plans for a JIT CPU runtime with SIMD using Cranelift.
  • Automatic vectorization, comptime optimizations, and autotuning for performance.
  • Memory management optimized for throughput with buffer reuse.
  • Includes linear algebra components like optimized matrix multiplication.
  • Future plans include convolutions, random number generation, and FFTs.
  • Two-step compilation process: parsing with `syn` crate and IR generation.
  • Topology based on cuboids, mapping to hardware with 3D representations.
  • Comparison of CubeCL variables with CUDA, WebGPU, and Metal equivalents.
  • Comptime feature enables runtime IR modifications for optimizations.
  • Autotuning benchmarks kernels at runtime for optimal performance.
  • CubeCL is currently in alpha, used in the Burn project, with ongoing refinements.