An almost catastrophic OpenZFS bug and the humans that made it
10 months ago
- #Software Bugs
- #OpenZFS
- #Rust
- A critical bug was found in OpenZFS's `vdev_raidz_asize_to_psize` function, which incorrectly returned the input `asize` instead of the calculated `psize`.
- The bug could lead to data corruption by writing past allocated disk space, a silent and dangerous failure.
- The bug was discovered during testing with aggressive allocator fragmentation settings, highlighting the importance of thorough testing.
- Static analyzers in C could detect the unused variable `psize`, but such tools are not commonly integrated into everyday workflows due to their cost and false positives.
- Rust's type system could prevent such bugs by distinguishing between `PhysicalSize` and `AllocatedSize` types, making accidental swaps a compile-time error.
- The discussion reflects on the limitations of human error detection and the value of tooling to catch such mistakes, rather than relying solely on programmer competence.
- The narrative challenges the notion that 'competent programmers don't need tools,' emphasizing that even experienced developers can overlook subtle bugs.
- The author expresses a nuanced view on Rust, appreciating its safety features while acknowledging the learning curve and potential mismatches for certain tasks.