Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop
7 hours ago
- #Utilization
- #GPU
- #AI
- Standard GPU utilization metrics (e.g., from nvidia-smi, cloud dashboards) are misleading, often showing 100% even when real compute throughput is as low as 1%, leading to wasted spend, energy, and unnecessary hardware purchases.
- Systalyze open-sourced Utilyze, a free monitoring tool that accurately measures true GPU compute and memory bandwidth utilization via hardware performance counters, reporting Compute SOL % and Memory SOL % with negligible overhead.
- Utilyze employs the Speed-of-Light (SOL) model to show how close workloads are to theoretical hardware limits, but also provides Attainable SOL %, the realistic ceiling for a specific workload, helping identify optimization potential.
- Unlike tools like DCGM's SM Active, which can misreport high utilization during memory-bound tasks, Utilyze correctly distinguishes between active and idle GPU resources, validated against ground-truth calculations.
- Case studies demonstrate Utilyze's value: in LLM inference, it revealed underutilization (e.g., 45% Compute SOL vs. 100% in nvtop) and guided optimizations that boosted throughput; in fine-tuning, it showed low utilization (1-7% for LoRA) and highlighted optimization opportunities.
- Systalyze's platform builds on Utilyze's measurements to automate optimizations (e.g., kernel fusion, parallelism tuning), recovering 2-10x performance in deployments, moving workloads toward their Attainable SOL %.
- The community is encouraged to use Utilyze, share findings on GitHub, and collaborate on expanding support, with AMD hardware support planned for the future.