Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation
7 days ago
- #CUDA
- #AMD
- #GPU-Computing
- APEX GPU enables running unmodified CUDA applications on AMD GPUs using LD_PRELOAD without recompilation.
- It translates CUDA calls to AMD equivalents at runtime, covering core operations like memory management, streams, events, and kernels.
- Supports 38 CUDA functions, 15+ cuBLAS operations, and 8+ cuDNN operations for neural networks.
- Requires AMD GPU (RDNA2/RDNA3 or CDNA series) with ROCm 5.0+ on Linux.
- Minimal overhead (<1% for typical workloads) and production-ready with a 100% test pass rate.
- Includes bridges for HIP, cuBLAS, and cuDNN, each with a small footprint (40KB, 22KB, 31KB respectively).
- Works with popular frameworks like PyTorch and TensorFlow without code changes.
- Licensed under CC BY-NC-SA 4.0 for non-commercial use; commercial licenses available upon request.
- Future roadmap includes support for CUDA Driver API, unified memory, and performance profiling tools.
- Encourages community contributions for testing, adding missing functions, and improving documentation.