Show HN: Deterministic PCIe Diagnostics for GPUs on Linux
3 days ago
- #GPU
- #PCIe
- #diagnostics
- A deterministic command-line tool for validating GPU PCIe link health, bandwidth, and real-world PCIe utilization using only observable hardware data.
- The tool measures PCIe current and maximum link generation and width, peak copy bandwidth, sustained PCIe utilization under load, and efficiency relative to theoretical PCIe payload bandwidth.
- Provides clear verdicts based on observable conditions: OK, DEGRADED, or UNDERPERFORMING.
- Diagnoses common PCIe issues like link negotiation problems, generation downgrades, and reduced bandwidth.
- Requires NVIDIA GPU with supported driver, CUDA Toolkit, NVML development library, and Linux OS.
- Supports logging in CSV and JSON formats for time-series analysis and automated monitoring.
- Includes multi-GPU mode for independent evaluation of each GPU.
- Does not modify BIOS, firmware, registry, or PCIe configuration; reports observable facts only.
- Open-source under MIT License, authored by Joe McLaren.