Hasty Briefsbeta

Bilingual

Why Your CPU Is Fast but Your Program Is Slow: Understanding the Memory Wall

4 days ago
  • #cache-hierarchy
  • #memory-wall
  • #cpu-performance
  • CPU executes billions of operations per second, but actual program speed often depends on memory access patterns.
  • The 'Memory Wall' refers to the performance gap between fast CPUs and slower DRAM memory, with latency differences up to 100x.
  • DRAM stores data in capacitors arranged in rows; accessing memory involves row activation and precharge, making random access patterns slow.
  • Cache hierarchy (L1, L2, L3) mitigates memory latency by storing frequently used data closer to the CPU, with L1 being fastest and smallest.
  • Stride scan experiments show performance drops sharply at a stride of 64 bytes due to cache line inefficiency, where every access becomes a cache miss.
  • Programs can be memory-bound (limited by data access speed) or compute-bound (limited by CPU computation); many are actually memory-bound despite high CPU utilization.
  • Optimizing memory access patterns (e.g., sequential access) is crucial for performance, as changing data movement often matters more than making the CPU faster.