Hasty Briefsbeta

K8s with 1M Nodes

3 days ago
  • #Distributed Systems
  • #Scalability
  • #Kubernetes
  • The project aims to scale Kubernetes to 1 million nodes, addressing scalability challenges.
  • Key challenges include etcd scalability, kube-apiserver performance, networking, and scheduling.
  • Networking solutions involve using IPv6 exclusively to handle the large address space required for 1 million nodes.
  • etcd is identified as a major bottleneck, with proposed solutions including reducing durability and eliminating replicas.
  • A custom in-memory etcd implementation (mem_etcd) is developed to improve performance.
  • The scheduler is optimized using a distributed scatter-gather design to handle large-scale pod scheduling.
  • Performance tuning includes adjusting garbage collection settings and using pinned CPUs to reduce latency.
  • Experiments show that the distributed scheduler can handle 1 million pods on 1 million nodes within the target time.
  • The project concludes that Kubernetes can scale to 1 million nodes with careful optimizations, though operational challenges remain.