Disk can lie to you when you write to it

2 days ago

Copy Link

A write-ahead log (WAL) is essential for database durability, but disks can fail silently.
Common issues include the page cache problem, disks lying about success, write ordering chaos, and single points of failure.
Five layers of defense for a robust WAL: checksums, dual WAL files, O_DIRECT + O_DSYNC, linked I/O ordering, and post-fsync verification reads.
Checksums (CRC32C) detect silent data corruption from hardware or firmware errors.
Dual WAL files protect against latent sector errors (LSEs) by maintaining redundant copies.
O_DIRECT and O_DSYNC ensure data is written directly to disk, bypassing the kernel's page cache.
Linked I/O ordering (io_uring in Linux) guarantees write and fsync operations complete in the correct sequence.
Post-fsync verification reads catch silent failures immediately by re-reading and validating written data.
Recovery involves scanning both WAL files, merging valid records, and replaying operations to restore consistent state.
Real-world scenarios highlight the importance of these layers, such as silent corruption and page cache surprises.
A production-grade WAL must include checksums, redundancy, direct writes, operation ordering, and verification to fulfill its durability contract.

Hasty Briefsbeta