The eighth-generation TPU: An architecture deep dive

5 hours ago

#AI Hardware
#TPU Architecture
#Google Cloud

The eighth-generation TPUs (TPU 8t and TPU 8i) are designed to address evolving AI workloads, including agentic AI, world models, and reasoning-heavy architectures, focusing on scalability, reliability, and efficiency.
TPU 8t is optimized for large-scale pre-training and embedding-heavy workloads, featuring a 3D torus network, SparseCore for embedding lookups, native FP4 for memory bandwidth, and Virgo Network for increased data center bandwidth.
TPU 8t includes TPUDirect RDMA and TPUDirect Storage to bypass host bottlenecks, enabling faster data transfers and 10x faster storage access compared to previous generations.
TPU 8i is specialized for post-training and high-concurrency reasoning, with large on-chip SRAM for KV Cache, a Collectives Acceleration Engine (CAE) for low-latency synchronization, and Boardfly network topology for reduced hops in all-to-all communication.
Boardfly topology in TPU 8i reduces network diameter from 16 hops in a 3D torus to 7 hops, lowering latency for communication-intensive workloads like MoE and reasoning models.
The eighth-generation TPUs integrate Arm-based Axion CPU headers to remove host bottlenecks, support Pallas for custom kernels, offer native PyTorch preview, and maintain portability with JAX and Keras.
Performance improvements include up to 2.7x training price-performance for TPU 8t, up to 80% inference price-performance for TPU 8i, and up to 2x better energy efficiency compared to seventh-generation TPUs.
The TPUs are part of Google Cloud's AI Hypercomputer, combining hardware, software, and networking to support the full AI lifecycle, with modular architecture for future scalability.

Hasty Briefsbeta

The eighth-generation TPU: An architecture deep dive