PostgreSQL and the OOM Killer: Why You Must Use Strict Memory Overcommit
4 hours ago
- #Memory Management
- #Linux Kernel
- #PostgreSQL
- Strict memory overcommit is used to protect PostgreSQL databases from catastrophic OOM kills by converting destructive failures into graceful errors.
- PostgreSQL's architecture means an OOM kill can corrupt shared memory, leading to connection drops, transaction aborts, and lengthy crash recovery.
- A kernel bug (introduced in Linux 6.5) caused phantom committed memory inflation, leading to unexpected ENOMEM errors under strict overcommit.
- The bug was fixed in Linux 6.8 by correcting a condition in move_vma() that reversed error handling for memory accounting.
- Ubicloud's heuristic for setting the commit limit is: 80% of physical memory (excluding hugepages) plus a fixed 2 GB buffer for sidecar processes.
- Monitoring memory usage and understanding workload characteristics is essential before enabling strict overcommit to avoid frequent ENOMEM errors.