Evaluating Uniform Memory Access Mode on AMD's Turin
14 days ago
- #NUMA
- #Memory Performance
- #AMD EPYC
- NUMA (Non-Uniform Memory Access) exposes affinity between cores and memory controllers, with modern servers subdividing sockets into multiple NUMA nodes.
- NPS0 mode on AMD's EPYC 9575F provides uniform memory access by treating a dual-socket system as a single entity, distributing memory accesses evenly across all controllers.
- NPS0 simplifies programming by avoiding NUMA optimization complexities but incurs a high latency penalty (~220 ns) compared to NUMA-aware modes.
- NPS0 offers bandwidth advantages but only becomes beneficial at high bandwidth demands (~400 GB/s).
- SPEC CPU2017 performance varies: NPS0 benefits high-clock-speed, cache-friendly workloads but struggles with memory-intensive tasks.
- Despite high latency, the EPYC 9575F performs well in SPEC CPU2017 due to its 5 GHz clock speed and efficient caching.
- NPS0 is not recommended for modern systems due to significant latency penalties and minor bandwidth gains for NUMA-unaware code.
- Verda (formerly DataCrunch) provided the test system with EPYC 9575Fs and upcoming B200 GPU coverage.