TorchTPU: Running PyTorch Natively on TPUs at Google Scale

6 hours ago

TorchTPU enables PyTorch to work natively and efficiently on Google's TPUs, allowing developers to migrate existing workloads with minimal code changes.
It prioritizes usability with an 'Eager First' philosophy, offering Debug Eager, Strict Eager, and performance-boosting Fused Eager modes, all supported by a shared compilation cache.
For peak performance, TorchTPU integrates with torch.compile using XLA as the backend compiler, leveraging its optimization for TPU topologies and supporting custom kernels via Pallas and JAX.
The system supports PyTorch's distributed APIs like DDP, FSDPv2, and DTensor, and is architected to handle divergent executions (MPMD) to maintain natural PyTorch developer experiences.
Future roadmap for 2026 includes reducing recompilations with bounded dynamism, building precompiled TPU kernels, and enhancing support for custom kernels and enterprise-level diagnostics.

Hasty Briefsbeta