Hasty Briefsbeta

Bilingual

Show HN: Continuous Nvidia CUDA PC Sampling Profiler

4 days ago
  • #Open Source
  • #CUDA
  • #GPU Profiling
  • Added low-overhead PC sampling support to Polar Signals' continuous profiler using the open-source Parca Agent v0.48.0.
  • PC sampling in NVIDIA CUDA collects program counter and stall reason data via hardware per-warp at configurable intervals (sampling factor 5-31, default 20).
  • Uses kernel-serialized mode with periodic sampling (~50ms on, then off) to maintain low overhead, targeting 100 samples/sec.
  • Harvests data via USDT probes in a shim library, handling metadata replay for late-attaching agents and caching for context.
  • Symbolizes PC offsets by uploading CUBINs to a backend service, requiring the -lineinfo flag in nvcc for source mapping.
  • Enables production use by minimizing overhead and integrating with existing profiling features like kernel timing and call stacks.