Hasty Briefsbeta

Interesting PEZY-SC4s

5 days ago
  • #FP64
  • #PEZY-SC4s
  • #supercomputing
  • PEZY-SC4s was presented at Hot Chips 2025, focusing on power-efficient FP64 compute.
  • Japan has a strong tradition in supercomputing, with PEZY Computing being a key player alongside Fujitsu and NEC.
  • PEZY-SC4s is designed for highly efficient FP64 compute, using massively parallel execution units at lower clocks and voltages than GPUs.
  • The 's' in SC4s denotes a scaled-down model with a smaller die and lower power draw compared to its larger counterparts.
  • PEZY-SC4s features a sophisticated cache hierarchy and low branching penalties to avoid performance bottlenecks.
  • The architecture includes a quad-core RISC-V management processor, using the open-source Rocket Core.
  • PEZY-SC4s connects to host systems via a 16-lane PCIe Gen 5 interface, an upgrade from the Gen 4 used in PEZY-SC3.
  • The memory subsystem includes small PE-private L1 caches, shared L2 caches, and a 64 MB last-level cache (L3).
  • PEZY-SC4s uses four HBM3 stacks for system memory, providing 3.2 TB/s bandwidth and 96 GB capacity.
  • The design targets applications requiring high precision and accuracy, such as simulations, where FP64 is crucial.
  • PEZY-SC4s is expected to achieve ~91 Gigaflops per Watt (GF/W) of FP64 performance, outperforming Nvidia's H200 and competing with AMD's MI300A.
  • Japan's approach to domestic hardware development allows for tightly targeted designs, contrasting with other countries relying on US-designed chips.