Hasty Briefsbeta

Bilingual

How Much Linear Memory Access Is Enough?

2 days ago
  • #Memory Performance
  • #Block Size Optimization
  • #CPU Cache Effects
  • Linear memory access performance depends on block size, with diminishing returns beyond certain thresholds.
  • For peak performance: 1 MB blocks are generally sufficient, 128 kB works for at least ~1 cycle per byte, and 4 kB is adequate for ~10+ cycles per byte.
  • Experiments used kernels like scalar_stats (light processing), simd_sum (fast SIMD), and heavy_sin (heavy computation) to test block sizes.
  • Cache clobbering and randomized layouts were used to simulate cold cache scenarios, while repeated runs modeled pre-warmed caches.
  • Results show block size requirements vary with workload intensity, with heavier computations needing smaller blocks for peak performance.
  • The findings apply to linear per-block processing; striding, allocations, or other per-block costs may shift the curves.