Hasty Briefsbeta

Bilingual

Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained

3 hours ago
  • #Linux Kernel
  • #PostgreSQL Performance
  • #Memory Management
  • Linux 7.0 removed the PREEMPT_NONE preemption option, causing PostgreSQL throughput to drop by half on a 96-vCPU Graviton4 machine.
  • The performance regression was traced to increased spinlock contention in PostgreSQL's StrategyGetBuffer function due to minor page faults during lock holding.
  • Under PREEMPT_LAZY in Linux 7.0, preemption during page faults extended spinlock hold times, leading to excessive CPU spinning by waiting backends.
  • Using huge pages (e.g., 2 MB or 1 GB) instead of default 4 KB pages drastically reduces potential page faults and TLB pressure, resolving the issue.
  • A kernel fix involving Restartable Sequences (rseq) was proposed, but PostgreSQL community favored huge pages as a more straightforward solution.