MD RAID or DRBD can be broken from userspace when using O_DIRECT
a day ago
- #O_DIRECT
- #RAID
- #data-corruption
- MD RAID, DRBD, and similar software RAID-like block devices can become silently inconsistent if a userspace program misbehaves.
- A test case demonstrates that using O_DIRECT writes can cause RAID mirrors to become out of sync, leading to different data on each mirror.
- The issue is reproducible with both MD RAID and DRBD, and can also affect filesystems like EXT3/EXT4 and BTRFS when placed on top of these RAID devices.
- Virtual machines using O_DIRECT (e.g., with cache=none) can trigger this issue, potentially allowing non-root users inside a VM to degrade the host's RAID.
- ZFS, bcachefs, and recent BTRFS versions handle O_DIRECT writes correctly, avoiding mirror inconsistencies by either bouncing writes or using checksums.
- The root cause is that O_DIRECT allows userspace to modify buffers during write operations, which RAID layers independently read, leading to inconsistencies.
- Proposed solutions include disabling O_DIRECT by default or implementing proper locking mechanisms in the kernel to prevent buffer modifications during writes.