Hasty Briefsbeta

Bilingual

Ironwood: The first Google TPU for the age of inference

a year ago
  • #Google Cloud
  • #AI
  • #TPU
  • Google introduces Ironwood, its seventh-generation Tensor Processing Unit (TPU), specifically designed for inference.
  • Ironwood marks a shift from responsive AI models to proactive, inferential AI models, heralding the 'age of inference.'
  • It scales up to 9,216 liquid-cooled chips with breakthrough Inter-Chip Interconnect (ICI) networking, supporting nearly 10 MW.
  • Ironwood is part of Google Cloud AI Hypercomputer architecture, optimized for demanding AI workloads.
  • Developers can leverage Google’s Pathways software stack to harness the power of tens of thousands of Ironwood TPUs.
  • Ironwood supports 42.5 Exaflops at full scale, 24x the compute power of the world’s largest supercomputer.
  • Features include enhanced SparseCore for ultra-large embeddings and improved memory and network architecture.
  • Ironwood offers significant performance gains, 2x power efficiency over Trillium, and 30x more efficiency than the first Cloud TPU.
  • It includes 192 GB HBM per chip (6x Trillium) and 7.2 Tbps HBM bandwidth (4.5x Trillium).
  • Enhanced ICI bandwidth of 1.2 Tbps bidirectional (1.5x Trillium) enables faster chip communication.
  • Ironwood supports leading AI models like Gemini 2.5 and AlphaFold, with availability later this year.