Building the largest known Kubernetes cluster, with 130k nodes

a day ago

Copy Link

Google Kubernetes Engine (GKE) successfully ran a 130,000-node cluster in experimental mode, doubling the previous limit.
Scaling involves not just nodes but also Pod creation, scheduling throughput, and distributed storage, sustaining 1,000 Pods per second.
AI workloads are driving demand for mega-clusters, with power constraints shifting focus to multi-cluster solutions like MultiKueue.
Key innovations include optimized read scalability with Consistent Reads from Cache and Snapshottable API Server Cache.
A proprietary key-value store based on Google’s Spanner database supports massive scale with 13,000 QPS for lease updates.
Kueue provides advanced job queueing, enabling workload prioritization and 'all-or-nothing' scheduling for AI/ML environments.
Future scheduling enhancements aim for workload-aware scheduling, moving from Pod-centric to workload-centric approaches.
GCS FUSE and Google Cloud Managed Lustre offer scalable, high-throughput data access for AI workloads.
A four-phase benchmark validated GKE’s performance, showing efficient preemption, scheduling, and elasticity under extreme loads.
GKE demonstrated stability with low latency, high throughput (1,000 Pods/s), and over 1 million objects in the database.

Hasty Briefsbeta