Show HN: Continuous Nvidia CUDA PC Sampling Profiler
4 days ago
- #Open Source
- #CUDA
- #GPU Profiling
- Added low-overhead PC sampling support to Polar Signals' continuous profiler using the open-source Parca Agent v0.48.0.
- PC sampling in NVIDIA CUDA collects program counter and stall reason data via hardware per-warp at configurable intervals (sampling factor 5-31, default 20).
- Uses kernel-serialized mode with periodic sampling (~50ms on, then off) to maintain low overhead, targeting 100 samples/sec.
- Harvests data via USDT probes in a shim library, handling metadata replay for late-attaching agents and caching for context.
- Symbolizes PC offsets by uploading CUBINs to a backend service, requiring the -lineinfo flag in nvcc for source mapping.
- Enables production use by minimizing overhead and integrating with existing profiling features like kernel timing and call stacks.