Ironwood: The first Google TPU for the age of inference

a year ago

Google introduces Ironwood, its seventh-generation Tensor Processing Unit (TPU), specifically designed for inference.
Ironwood marks a shift from responsive AI models to proactive, inferential AI models, heralding the 'age of inference.'
It scales up to 9,216 liquid-cooled chips with breakthrough Inter-Chip Interconnect (ICI) networking, supporting nearly 10 MW.
Ironwood is part of Google Cloud AI Hypercomputer architecture, optimized for demanding AI workloads.
Developers can leverage Google’s Pathways software stack to harness the power of tens of thousands of Ironwood TPUs.
Ironwood supports 42.5 Exaflops at full scale, 24x the compute power of the world’s largest supercomputer.
Features include enhanced SparseCore for ultra-large embeddings and improved memory and network architecture.
Ironwood offers significant performance gains, 2x power efficiency over Trillium, and 30x more efficiency than the first Cloud TPU.
It includes 192 GB HBM per chip (6x Trillium) and 7.2 Tbps HBM bandwidth (4.5x Trillium).
Enhanced ICI bandwidth of 1.2 Tbps bidirectional (1.5x Trillium) enables faster chip communication.
Ironwood supports leading AI models like Gemini 2.5 and AlphaFold, with availability later this year.

Hasty Briefsbeta