Hasty Briefsbeta

AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip

10 hours ago
  • #Performance-Analysis
  • #AMD-EPYC
  • #Server-CPUs
  • High core count chips are not the only solution; scalable designs from Intel, AMD, and Arm cater to varying needs.
  • AMD’s EPYC 9355P optimizes per-core performance with higher clock speeds (4.4 GHz), more L3 cache per core, and GMI-Wide interconnect for better bandwidth.
  • GMI-Wide provides 64B/cycle bandwidth per CCD, improving latency control and bandwidth under load compared to GMI-Narrow.
  • EPYC 9355P tested on Dell PowerEdge R6715 with 768 GB DDR5-5200, showcasing NUMA configurations (NPS1, NPS2, NPS4) and memory performance.
  • NUMA modes (NPS1, NPS2, NPS4) show minimal latency improvements, with NPS1 generally sufficient for most workloads.
  • GMI-Wide enhances off-CCD bandwidth (99.8 GB/s read) and mitigates latency spikes under high bandwidth loads.
  • SPEC CPU2017 tests reveal EPYC 9355P’s competitive single-thread performance and superior bandwidth handling in multi-core scenarios.
  • AMD’s hub-and-spoke memory model (since Zen 2) offers consistent DRAM performance, contrasting Intel’s monolithic approach.
  • EPYC 9355P exemplifies AMD’s strategy of balancing core count, cache, and interconnect for optimal per-core performance.
  • Acknowledgments to Dell and ZeroOne Technology for hardware support.