AMD's EPYC 9355P: Inside a 32 Core Zen 5 Server Chip
9 hours ago
- #Performance-Analysis
- #AMD-EPYC
- #Server-CPUs
- High core count chips are not the only solution; scalable designs from Intel, AMD, and Arm cater to varying needs.
- AMD’s EPYC 9355P optimizes per-core performance with higher clock speeds (4.4 GHz), more L3 cache per core, and GMI-Wide interconnect for better bandwidth.
- GMI-Wide provides 64B/cycle bandwidth per CCD, improving latency control and bandwidth under load compared to GMI-Narrow.
- EPYC 9355P tested on Dell PowerEdge R6715 with 768 GB DDR5-5200, showcasing NUMA configurations (NPS1, NPS2, NPS4) and memory performance.
- NUMA modes (NPS1, NPS2, NPS4) show minimal latency improvements, with NPS1 generally sufficient for most workloads.
- GMI-Wide enhances off-CCD bandwidth (99.8 GB/s read) and mitigates latency spikes under high bandwidth loads.
- SPEC CPU2017 tests reveal EPYC 9355P’s competitive single-thread performance and superior bandwidth handling in multi-core scenarios.
- AMD’s hub-and-spoke memory model (since Zen 2) offers consistent DRAM performance, contrasting Intel’s monolithic approach.
- EPYC 9355P exemplifies AMD’s strategy of balancing core count, cache, and interconnect for optimal per-core performance.
- Acknowledgments to Dell and ZeroOne Technology for hardware support.