Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation

7 days ago

Copy Link

APEX GPU enables running unmodified CUDA applications on AMD GPUs using LD_PRELOAD without recompilation.
It translates CUDA calls to AMD equivalents at runtime, covering core operations like memory management, streams, events, and kernels.
Supports 38 CUDA functions, 15+ cuBLAS operations, and 8+ cuDNN operations for neural networks.
Requires AMD GPU (RDNA2/RDNA3 or CDNA series) with ROCm 5.0+ on Linux.
Minimal overhead (<1% for typical workloads) and production-ready with a 100% test pass rate.
Includes bridges for HIP, cuBLAS, and cuDNN, each with a small footprint (40KB, 22KB, 31KB respectively).
Works with popular frameworks like PyTorch and TensorFlow without code changes.
Licensed under CC BY-NC-SA 4.0 for non-commercial use; commercial licenses available upon request.
Future roadmap includes support for CUDA Driver API, unified memory, and performance profiling tools.
Encourages community contributions for testing, adding missing functions, and improving documentation.

Hasty Briefsbeta