AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip
17 hours ago
- #Server CPUs
- #Performance Analysis
- #AMD EPYC
- High core count chips are not the only solution; scalable designs from Intel, AMD, and Arm cater to varying core counts.
- AMD’s EPYC 9355P enhances per-core performance with higher clock speeds (4.4 GHz), more cache per core, and GMI-Wide interconnect for better bandwidth.
- The EPYC 9355P uses eight CPU dies (CCDs) with four cores each, maintaining full L3 cache (32 MB per CCD) for a high cache-to-core ratio.
- GMI-Wide provides 64B/cycle bandwidth per CCD, improving latency and bandwidth under load compared to GMI-Narrow in desktop CPUs.
- NUMA configurations (NPS1, NPS2, NPS4) on EPYC 9355P show minimal latency improvements, with NPS1 being sufficient for most cases.
- EPYC 9355P achieves near-theoretical memory bandwidth in NUMA modes, with minor penalties for cross-node accesses.
- GMI-Wide mitigates bandwidth bottlenecks, offering better latency control and higher bandwidth than desktop counterparts under load.
- SPEC CPU2017 tests show EPYC 9355P’s competitive single-thread performance and superior bandwidth handling in multi-core scenarios.
- AMD’s hub-and-spoke memory model in Zen 5 EPYC provides more consistent DRAM performance compared to Intel’s Xeon 6.
- Intel and Arm favor monolithic interconnects, while AMD’s scalable design balances latency and bandwidth effectively.