Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

2 days ago

Matrix multiplication performance on GPUs varies with input data types due to dynamic power usage, where predictable data like zeros reduces transistor switching and power consumption.
Under power limits, GPUs may throttle clock speeds when dynamic power from unpredictable data (e.g., random values) exceeds thresholds, impacting achievable FLOPS.
Benchmarks show that uniform or constant inputs yield higher teraflops than normally distributed data, revealing a gap between theoretical peak FLOPS and real-world performance influenced by power constraints.
Adjusting GPU power and clock limits demonstrates that predictable inputs mitigate throttling effects, highlighting power as a key bottleneck in high-performance computing, especially with newer GPUs like H100 and B100.

Hasty Briefsbeta