K8s with 1M Nodes
3 days ago
- #Distributed Systems
- #Scalability
- #Kubernetes
- The project aims to scale Kubernetes to 1 million nodes, addressing scalability challenges.
- Key challenges include etcd scalability, kube-apiserver performance, networking, and scheduling.
- Networking solutions involve using IPv6 exclusively to handle the large address space required for 1 million nodes.
- etcd is identified as a major bottleneck, with proposed solutions including reducing durability and eliminating replicas.
- A custom in-memory etcd implementation (mem_etcd) is developed to improve performance.
- The scheduler is optimized using a distributed scatter-gather design to handle large-scale pod scheduling.
- Performance tuning includes adjusting garbage collection settings and using pinned CPUs to reduce latency.
- Experiments show that the distributed scheduler can handle 1 million pods on 1 million nodes within the target time.
- The project concludes that Kubernetes can scale to 1 million nodes with careful optimizations, though operational challenges remain.