Hasty Briefsbeta

Bilingual

What happens when you run a CUDA kernel?

3 days ago
  • #CUDA
  • #Kernel-Launch
  • #GPU-Execution
  • CUDA programs are compiled by nvcc into host code and device code, with the device code undergoing transformations from PTX to SASS.
  • The host code uses a stub to pack kernel arguments and triggers the GPU via the CUDA runtime and driver, involving ioctls and a doorbell register.
  • The GPU executes kernels via a work distributor assigning blocks to SMs, with warps scheduled using compiler-encoded control bits to manage dependencies and stalls.
  • Memory accesses are coalesced, leveraging caches and DRAM, with performance often limited by memory bandwidth for low arithmetic intensity kernels.
  • Completion is signaled via semaphores, allowing asynchronous execution and data transfer back to the host for output.