Hasty Briefsbeta

Bilingual

Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop

6 hours ago
  • #Utilization
  • #GPU
  • #AI
  • Standard GPU utilization metrics (e.g., from nvidia-smi, cloud dashboards) are misleading, often showing 100% even when real compute throughput is as low as 1%, leading to wasted spend, energy, and unnecessary hardware purchases.
  • Systalyze open-sourced Utilyze, a free monitoring tool that accurately measures true GPU compute and memory bandwidth utilization via hardware performance counters, reporting Compute SOL % and Memory SOL % with negligible overhead.
  • Utilyze employs the Speed-of-Light (SOL) model to show how close workloads are to theoretical hardware limits, but also provides Attainable SOL %, the realistic ceiling for a specific workload, helping identify optimization potential.
  • Unlike tools like DCGM's SM Active, which can misreport high utilization during memory-bound tasks, Utilyze correctly distinguishes between active and idle GPU resources, validated against ground-truth calculations.
  • Case studies demonstrate Utilyze's value: in LLM inference, it revealed underutilization (e.g., 45% Compute SOL vs. 100% in nvtop) and guided optimizations that boosted throughput; in fine-tuning, it showed low utilization (1-7% for LoRA) and highlighted optimization opportunities.
  • Systalyze's platform builds on Utilyze's measurements to automate optimizations (e.g., kernel fusion, parallelism tuning), recovering 2-10x performance in deployments, moving workloads toward their Attainable SOL %.
  • The community is encouraged to use Utilyze, share findings on GitHub, and collaborate on expanding support, with AMD hardware support planned for the future.