Async/Await on the GPU

7 days ago

VectorWare announces the successful use of Rust's Future trait and async/await on the GPU, marking a significant step towards high-performance GPU-native applications.
Traditional GPU programming focuses on data parallelism, but warp specialization introduces task-based parallelism, requiring manual concurrency management.
Projects like JAX, Triton, and CUDA Tile aim to simplify GPU programming with higher-level abstractions but come with adoption barriers and limited code reuse.
Rust's Future trait and async/await provide structured concurrency in an existing language, allowing composability and fine-grained control without a new ecosystem.
VectorWare demonstrates async/await on the GPU using a simple block_on executor and adapts the Embassy executor for GPU use, showing concurrent task execution.
Challenges include cooperative multitasking, lack of GPU interrupts, increased register pressure, and the function coloring problem.
Future work includes GPU-native executors, leveraging CUDA Graphs, and exploring alternative concurrency models in Rust.
VectorWare supports multiple programming languages but sees Rust as uniquely suited for high-performance, reliable GPU-native applications.

Hasty Briefsbeta