Ironwood: The first Google TPU for the age of inference
a year ago
- #Google Cloud
- #AI
- #TPU
- Google introduces Ironwood, its seventh-generation Tensor Processing Unit (TPU), specifically designed for inference.
- Ironwood marks a shift from responsive AI models to proactive, inferential AI models, heralding the 'age of inference.'
- It scales up to 9,216 liquid-cooled chips with breakthrough Inter-Chip Interconnect (ICI) networking, supporting nearly 10 MW.
- Ironwood is part of Google Cloud AI Hypercomputer architecture, optimized for demanding AI workloads.
- Developers can leverage Google’s Pathways software stack to harness the power of tens of thousands of Ironwood TPUs.
- Ironwood supports 42.5 Exaflops at full scale, 24x the compute power of the world’s largest supercomputer.
- Features include enhanced SparseCore for ultra-large embeddings and improved memory and network architecture.
- Ironwood offers significant performance gains, 2x power efficiency over Trillium, and 30x more efficiency than the first Cloud TPU.
- It includes 192 GB HBM per chip (6x Trillium) and 7.2 Tbps HBM bandwidth (4.5x Trillium).
- Enhanced ICI bandwidth of 1.2 Tbps bidirectional (1.5x Trillium) enables faster chip communication.
- Ironwood supports leading AI models like Gemini 2.5 and AlphaFold, with availability later this year.