Our eighth generation TPUs: two chips for the agentic era

4 hours ago

Google introduced the eighth generation TPUs, including TPU 8t for training and TPU 8i for inference, at Google Cloud Next.
TPU 8t features massive scale with 9,600 chips per superpod, 121 ExaFlops compute, and near-linear scaling up to a million chips.
TPU 8i is optimized for latency-sensitive inference, with innovations like breaking the 'memory wall' and doubling ICI bandwidth for MoE models.
Both chips are co-designed with Google DeepMind and Axion ARM-based CPUs, offering up to 2x better performance-per-watt than the previous generation.
The TPUs support frameworks like JAX, PyTorch, and vLLM, and will be available via Google’s AI Hypercomputer later this year.

Hasty Briefsbeta