Hasty Briefsbeta

Disk can lie to you when you write to it

2 days ago
  • #database
  • #WAL
  • #durability
  • A write-ahead log (WAL) is essential for database durability, but disks can fail silently.
  • Common issues include the page cache problem, disks lying about success, write ordering chaos, and single points of failure.
  • Five layers of defense for a robust WAL: checksums, dual WAL files, O_DIRECT + O_DSYNC, linked I/O ordering, and post-fsync verification reads.
  • Checksums (CRC32C) detect silent data corruption from hardware or firmware errors.
  • Dual WAL files protect against latent sector errors (LSEs) by maintaining redundant copies.
  • O_DIRECT and O_DSYNC ensure data is written directly to disk, bypassing the kernel's page cache.
  • Linked I/O ordering (io_uring in Linux) guarantees write and fsync operations complete in the correct sequence.
  • Post-fsync verification reads catch silent failures immediately by re-reading and validating written data.
  • Recovery involves scanning both WAL files, merging valid records, and replaying operations to restore consistent state.
  • Real-world scenarios highlight the importance of these layers, such as silent corruption and page cache surprises.
  • A production-grade WAL must include checksums, redundancy, direct writes, operation ordering, and verification to fulfill its durability contract.