Why Your CPU Is Fast but Your Program Is Slow: Understanding the Memory Wall

4 days ago

CPU executes billions of operations per second, but actual program speed often depends on memory access patterns.
The 'Memory Wall' refers to the performance gap between fast CPUs and slower DRAM memory, with latency differences up to 100x.
DRAM stores data in capacitors arranged in rows; accessing memory involves row activation and precharge, making random access patterns slow.
Cache hierarchy (L1, L2, L3) mitigates memory latency by storing frequently used data closer to the CPU, with L1 being fastest and smallest.
Stride scan experiments show performance drops sharply at a stride of 64 bytes due to cache line inefficiency, where every access becomes a cache miss.
Programs can be memory-bound (limited by data access speed) or compute-bound (limited by CPU computation); many are actually memory-bound despite high CPU utilization.
Optimizing memory access patterns (e.g., sequential access) is crucial for performance, as changing data movement often matters more than making the CPU faster.

Hasty Briefsbeta