Hasty Briefsbeta

Evaluating Uniform Memory Access Mode on AMD's Turin

14 days ago
  • #NUMA
  • #Memory Performance
  • #AMD EPYC
  • NUMA (Non-Uniform Memory Access) exposes affinity between cores and memory controllers, with modern servers subdividing sockets into multiple NUMA nodes.
  • NPS0 mode on AMD's EPYC 9575F provides uniform memory access by treating a dual-socket system as a single entity, distributing memory accesses evenly across all controllers.
  • NPS0 simplifies programming by avoiding NUMA optimization complexities but incurs a high latency penalty (~220 ns) compared to NUMA-aware modes.
  • NPS0 offers bandwidth advantages but only becomes beneficial at high bandwidth demands (~400 GB/s).
  • SPEC CPU2017 performance varies: NPS0 benefits high-clock-speed, cache-friendly workloads but struggles with memory-intensive tasks.
  • Despite high latency, the EPYC 9575F performs well in SPEC CPU2017 due to its 5 GHz clock speed and efficient caching.
  • NPS0 is not recommended for modern systems due to significant latency penalties and minor bandwidth gains for NUMA-unaware code.
  • Verda (formerly DataCrunch) provided the test system with EPYC 9575Fs and upcoming B200 GPU coverage.