Work_mem: It's a Trap
2 days ago
- #Memory Management
- #Database Performance
- #Postgres
- Henrietta Dombrovskaya's production cluster was killed by the OOM killer after consuming 2 TB of RAM, despite work_mem being set to 2 MB.
- The issue was reproduced on a separate server, revealing unexpected memory consumption by Postgres, particularly in ExecutorState and HashTableContext.
- The function pg_log_backend_memory_contexts was crucial in diagnosing the issue by logging the full memory context tree of a backend.
- work_mem is not per query but per operation, and memory is only released at the end of the operation, not during, leading to potential massive memory usage.
- The query in question involved a plpgsql function performing a copy operation and joining results, creating a single large operation with many memory chunks.
- Prevention strategies include fixing statistics, rewriting bad queries, using query timeouts, and monitoring memory usage with pg_log_backend_memory_contexts.
- Postgres's memory management is by design, and understanding it helps explain issues but doesn't inherently fix poorly written queries.