How We Found 7 TiB of Memory Just Sitting Around
6 months ago
- #Optimization
- #Scalability
- #Kubernetes
- Kubernetes clusters with a high number of namespaces experience memory overhead and apiserver load due to listwatch operations.
- Daemonsets exacerbate the issue by performing listwatch operations on every node, increasing memory usage and apiserver load.
- Optimization efforts for Calico reduced memory usage, but Vector, another daemonset, was found to consume significant memory by listwatching namespaces.
- A solution was identified by removing unnecessary namespace label checks in Vector, leading to a 50% memory reduction.
- A configuration error was discovered where the fix was only applied to one of two kubernetes_logs sources, but correcting this led to significant memory savings.
- The final fix resulted in a total memory reduction of 7 TiB across clusters, improving system efficiency and rollout stability.