The eighth-generation TPU: An architecture deep dive
5 hours ago
- #AI Hardware
- #TPU Architecture
- #Google Cloud
- The eighth-generation TPUs (TPU 8t and TPU 8i) are designed to address evolving AI workloads, including agentic AI, world models, and reasoning-heavy architectures, focusing on scalability, reliability, and efficiency.
- TPU 8t is optimized for large-scale pre-training and embedding-heavy workloads, featuring a 3D torus network, SparseCore for embedding lookups, native FP4 for memory bandwidth, and Virgo Network for increased data center bandwidth.
- TPU 8t includes TPUDirect RDMA and TPUDirect Storage to bypass host bottlenecks, enabling faster data transfers and 10x faster storage access compared to previous generations.
- TPU 8i is specialized for post-training and high-concurrency reasoning, with large on-chip SRAM for KV Cache, a Collectives Acceleration Engine (CAE) for low-latency synchronization, and Boardfly network topology for reduced hops in all-to-all communication.
- Boardfly topology in TPU 8i reduces network diameter from 16 hops in a 3D torus to 7 hops, lowering latency for communication-intensive workloads like MoE and reasoning models.
- The eighth-generation TPUs integrate Arm-based Axion CPU headers to remove host bottlenecks, support Pallas for custom kernels, offer native PyTorch preview, and maintain portability with JAX and Keras.
- Performance improvements include up to 2.7x training price-performance for TPU 8t, up to 80% inference price-performance for TPU 8i, and up to 2x better energy efficiency compared to seventh-generation TPUs.
- The TPUs are part of Google Cloud's AI Hypercomputer, combining hardware, software, and networking to support the full AI lifecycle, with modular architecture for future scalability.