5x perf increase on writes with FPW disabled in Postgres
2 days ago
- #performance scaling
- #database architecture
- #Postgres optimization
- Lakebase architecture decouples compute and storage, enabling performance optimizations impossible in monolithic Postgres.
- Traditional Postgres uses Full Page Writes (FPW) to prevent data corruption from torn pages during crashes, but this inflates WAL volume by up to 15x.
- In lakebase, compute is stateless with no local disk, eliminating the torn-page risk FPW addresses.
- Disabling FPW naively could cause unbounded delta chains in storage, increasing read latency and resource use.
- Image generation pushdown moves FPW-like image creation to the storage layer, generating full page images based on delta thresholds rather than checkpoints.
- Benchmarks show throughput gains scaling with compute size: up to 4.5x+ on 32 vCPU instances and a 94% reduction in WAL traffic.
- Production benefits include reduced WAL generation (e.g., from 30 MB/s to 1 MB/s), improved read latencies (p99 down by 30-50%), and higher ingestion throughput (e.g., 3x increase for one customer).
- The optimization was rolled out globally without downtime using Postgres's XLOG_FPW_CHANGE mechanism, enhancing scalability and stability.
- This is part of a broader effort to offload heavy tasks to scalable storage, eliminating Postgres write bottlenecks and improving performance.