Interesting PEZY-SC4s
5 days ago
- #FP64
- #PEZY-SC4s
- #supercomputing
- PEZY-SC4s was presented at Hot Chips 2025, focusing on power-efficient FP64 compute.
- Japan has a strong tradition in supercomputing, with PEZY Computing being a key player alongside Fujitsu and NEC.
- PEZY-SC4s is designed for highly efficient FP64 compute, using massively parallel execution units at lower clocks and voltages than GPUs.
- The 's' in SC4s denotes a scaled-down model with a smaller die and lower power draw compared to its larger counterparts.
- PEZY-SC4s features a sophisticated cache hierarchy and low branching penalties to avoid performance bottlenecks.
- The architecture includes a quad-core RISC-V management processor, using the open-source Rocket Core.
- PEZY-SC4s connects to host systems via a 16-lane PCIe Gen 5 interface, an upgrade from the Gen 4 used in PEZY-SC3.
- The memory subsystem includes small PE-private L1 caches, shared L2 caches, and a 64 MB last-level cache (L3).
- PEZY-SC4s uses four HBM3 stacks for system memory, providing 3.2 TB/s bandwidth and 96 GB capacity.
- The design targets applications requiring high precision and accuracy, such as simulations, where FP64 is crucial.
- PEZY-SC4s is expected to achieve ~91 Gigaflops per Watt (GF/W) of FP64 performance, outperforming Nvidia's H200 and competing with AMD's MI300A.
- Japan's approach to domestic hardware development allows for tightly targeted designs, contrasting with other countries relying on US-designed chips.