CUDA Ontology
6 days ago
- #GPU Computing
- #CUDA
- #Version Compatibility
- CUDA terminology is overloaded, referring to multiple distinct concepts like architecture, instruction set, source language, toolkit, and runtime.
- The term 'kernel' in CUDA can mean either the operating system kernel (OSkernel) or a GPU function (CUDAkernel).
- The term 'driver' in CUDA refers to either the NVIDIA GPU Driver (OSkernel-space) or the CUDA Driver API (user-space).
- CUDA's ecosystem is layered, with components like libcudart (Runtime API), libcuda (Driver API), and nvidia.ko (GPU Driver) interacting across OSkernel-space and user-space.
- Versioning in CUDA involves multiple independent schemes: compute capability (hardware), GPU driver version, CUDA Toolkit version, Runtime API version, and Driver API version.
- CUDA maintains forward compatibility, allowing older frontend versions to run on newer backends, but lacks backward compatibility.
- For successful execution, CUDA requires: (1) Driver API version ≥ Runtime API version, and (2) GPU code availability (SASS or PTX for the target GPU).
- Common failure modes include cudaErrorInsufficientDriver (version mismatch) and cudaErrorNoKernelImageForDevice (missing GPU code).
- Tools like nvidia-smi, nvcc, and torch.version.cuda report different version numbers, each measuring distinct aspects of the CUDA system.
- Practical guidelines include specifying minimum driver versions, bundling runtime libraries, and compiling for multiple compute capabilities.