Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained
5 hours ago
- #Linux Kernel
- #PostgreSQL Performance
- #Memory Management
- Linux 7.0 removed the PREEMPT_NONE preemption option, causing PostgreSQL throughput to drop by half on a 96-vCPU Graviton4 machine.
- The performance regression was traced to increased spinlock contention in PostgreSQL's StrategyGetBuffer function due to minor page faults during lock holding.
- Under PREEMPT_LAZY in Linux 7.0, preemption during page faults extended spinlock hold times, leading to excessive CPU spinning by waiting backends.
- Using huge pages (e.g., 2 MB or 1 GB) instead of default 4 KB pages drastically reduces potential page faults and TLB pressure, resolving the issue.
- A kernel fix involving Restartable Sequences (rseq) was proposed, but PostgreSQL community favored huge pages as a more straightforward solution.