Hasty Briefsbeta

AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip

18 hours ago
  • #Server CPUs
  • #Performance Analysis
  • #AMD EPYC
  • High core count chips are not the only solution; scalable designs from Intel, AMD, and Arm cater to varying core counts.
  • AMD’s EPYC 9355P enhances per-core performance with higher clock speeds (4.4 GHz), more cache per core, and GMI-Wide interconnect for better bandwidth.
  • The EPYC 9355P uses eight CPU dies (CCDs) with four cores each, maintaining full L3 cache (32 MB per CCD) for a high cache-to-core ratio.
  • GMI-Wide provides 64B/cycle bandwidth per CCD, improving latency and bandwidth under load compared to GMI-Narrow in desktop CPUs.
  • NUMA configurations (NPS1, NPS2, NPS4) on EPYC 9355P show minimal latency improvements, with NPS1 being sufficient for most cases.
  • EPYC 9355P achieves near-theoretical memory bandwidth in NUMA modes, with minor penalties for cross-node accesses.
  • GMI-Wide mitigates bandwidth bottlenecks, offering better latency control and higher bandwidth than desktop counterparts under load.
  • SPEC CPU2017 tests show EPYC 9355P’s competitive single-thread performance and superior bandwidth handling in multi-core scenarios.
  • AMD’s hub-and-spoke memory model in Zen 5 EPYC provides more consistent DRAM performance compared to Intel’s Xeon 6.
  • Intel and Arm favor monolithic interconnects, while AMD’s scalable design balances latency and bandwidth effectively.