TorchTPU: Running PyTorch Natively on TPUs at Google Scale
6 hours ago
- #Hardware Optimization
- #PyTorch Integration
- #AI Infrastructure
- TorchTPU enables PyTorch to work natively and efficiently on Google's TPUs, allowing developers to migrate existing workloads with minimal code changes.
- It prioritizes usability with an 'Eager First' philosophy, offering Debug Eager, Strict Eager, and performance-boosting Fused Eager modes, all supported by a shared compilation cache.
- For peak performance, TorchTPU integrates with torch.compile using XLA as the backend compiler, leveraging its optimization for TPU topologies and supporting custom kernels via Pallas and JAX.
- The system supports PyTorch's distributed APIs like DDP, FSDPv2, and DTensor, and is architected to handle divergent executions (MPMD) to maintain natural PyTorch developer experiences.
- Future roadmap for 2026 includes reducing recompilations with bounded dynamism, building precompiled TPU kernels, and enhancing support for custom kernels and enterprise-level diagnostics.