Hasty Briefsbeta

A Gentle Introduction to CUDA PTX

a day ago
  • #CUDA
  • #GPU
  • #PTX
  • PTX (Parallel Thread Execution) is a fundamental layer between CUDA code and NVIDIA GPU hardware, essential for deep performance analysis and accessing latest hardware features.
  • PTX serves as an ISA for a virtual machine, providing forward compatibility by translating to specific GPU SASS (streaming assembly) via ptxas.
  • The post introduces a PTX playground with a simple kernel example, demonstrating how to write and run PTX code using the CUDA Driver API.
  • Key PTX concepts include register declarations, data movement instructions (ld, st, mov), computation and control flow (mad, setp, bra), and special registers.
  • PTX's two-stage compilation (CUDA C++ → PTX → SASS) enables forward compatibility, with JIT compilation handling new GPU architectures.
  • The post walks through a complete PTX kernel for vector addition, explaining each instruction and its role in the computation.
  • Appendix A covers controlling the fatbin with nvcc flags (-arch, -gencode) and inspecting embedded PTX/SASS using cuobjdump.
  • Appendix B explains the full compilation pipeline, including NVVM IR as an intermediate representation based on LLVM.