Hasty Briefsbeta

CUDA Ontology

6 days ago
  • #GPU Computing
  • #CUDA
  • #Version Compatibility
  • CUDA terminology is overloaded, referring to multiple distinct concepts like architecture, instruction set, source language, toolkit, and runtime.
  • The term 'kernel' in CUDA can mean either the operating system kernel (OSkernel) or a GPU function (CUDAkernel).
  • The term 'driver' in CUDA refers to either the NVIDIA GPU Driver (OSkernel-space) or the CUDA Driver API (user-space).
  • CUDA's ecosystem is layered, with components like libcudart (Runtime API), libcuda (Driver API), and nvidia.ko (GPU Driver) interacting across OSkernel-space and user-space.
  • Versioning in CUDA involves multiple independent schemes: compute capability (hardware), GPU driver version, CUDA Toolkit version, Runtime API version, and Driver API version.
  • CUDA maintains forward compatibility, allowing older frontend versions to run on newer backends, but lacks backward compatibility.
  • For successful execution, CUDA requires: (1) Driver API version ≥ Runtime API version, and (2) GPU code availability (SASS or PTX for the target GPU).
  • Common failure modes include cudaErrorInsufficientDriver (version mismatch) and cudaErrorNoKernelImageForDevice (missing GPU code).
  • Tools like nvidia-smi, nvcc, and torch.version.cuda report different version numbers, each measuring distinct aspects of the CUDA system.
  • Practical guidelines include specifying minimum driver versions, bundling runtime libraries, and compiling for multiple compute capabilities.