Diagnosing a Linux Performance Regression
12 hours ago
- #Linux
- #Performance
- #Kubernetes
- Automattic uses Kubernetes for WordPress VIP applications with firewall rules for security.
- A performance regression in the Linux kernel ipset module caused operations to run up to 1,000 times slower.
- Monitoring scripts that usually took 2 seconds started taking over a minute due to slow iptables-save operations.
- The issue was traced to getsockopt calls for ipset information taking significantly longer.
- Kubernetes and kube-router use ipset for NetworkPolicy enforcement, managing over 6,000 sets per node.
- ipset swap operations, crucial for kube-router's firewall updates, were found to be the bottleneck.
- A kernel update introduced a regression where ip_set_swap operations slowed from microseconds to milliseconds.
- The regression was fixed after reporting to the Linux Kernel Mailing List, with patches quickly released.