Runc breaks pods when CPU requests aren't multiples of 10
14 days ago
- #containerd
- #cgroup
- #kubernetes
- Pod creation fails intermittently with CPU limit of 4096m due to non-deterministic calculation by containerd (409600 or 410000 microseconds).
- runc consistently calculates 410000 microseconds, causing mismatch when containerd picks 409600, leading to kernel rejection.
- Issue appears node-specific because nodes with containerd picking 409600 get stuck, failing all subsequent pod creations.
- Investigation shows containerd's non-deterministic behavior in converting millicores to microseconds, differing from runc's consistent rounding.
- Critical impact: Non-deterministic pod scheduling, broken nodes requiring manual intervention, and production issues on Amazon EKS clusters.
- Root cause: Lack of consistency between containerd and runc in CPU quota calculations, needing deterministic behavior.