AMD GPU Debugger
3 days ago
- #RDNA3
- #GPU Debugging
- #AMD
- The article discusses the development of a GPU debugger for AMD GPUs, similar to CPU debuggers, which allows pausing execution and examining the current state.
- The author explores using rocgdb, a debugger for AMD's ROCm environment, and is inspired by Marcell Kiss's blog posts on achieving GPU debugging.
- The process involves directly communicating with the GPU by opening the DRM file and using libdrm to interact with the kernel mode driver (KMD).
- Key steps include creating a context, allocating buffers for code and commands, and compiling shaders using clang assembler for AMD's GCN architecture.
- The article details the use of PM4 Packets for encoding commands, specifically Type 3 packets, which include opcodes and byte counts for GPU operations.
- The author explains the setup of trap handlers using TBA and TMA registers, which are privileged and require using the debugfs interface for user-space access.
- A custom trap handler is developed to save the GPU state when a trap occurs, allowing for debugging by halting the GPU and examining register values.
- The article covers the integration with SPIR-V for shader compilation, using RADV's ACO compiler to generate machine code from SPIR-V binaries.
- Features of the debugger include breakpoints, stepping through code, source code line mapping, and watchpoints for monitoring memory access.
- The author discusses potential future enhancements, such as deeper Vulkan integration for better debugging capabilities and support for more shader types.
- A bonus section includes incomplete code for user-mode page walking on AMD's RDNA3 architecture, highlighting the complexity of GPU memory management.