Hasty Briefsbeta

Bilingual

A Higgs-Bugson in the Linux Kernel

10 months ago
  • #Kerberos
  • #Debugging
  • #NFS
  • A higgs-bugson (difficult-to-reproduce bug) was found in Gord, a system storing and distributing trading activity data.
  • The bug involved rare -EACCES (Permission denied) errors during large file copies with NFS and Kerberos, despite correct permissions.
  • Debugging revealed the issue was related to Kerberos credentials and GSS sequence numbers in NFS requests and responses.
  • A test setup was created using a FUSE filesystem with in-memory random data to reproduce the bug.
  • eBPF and bpftrace were used to trace kernel functions and identify the bug's occurrence.
  • The bug was caused by mismatched GSS sequence numbers during NFS request retransmissions, leading to checksum validation failures.
  • A Wireshark plugin was developed to analyze packet checksums, confirming the kernel's incorrect validation.
  • The solution involved kernel patches to handle sequence number mismatches properly, now upstream in Linux 6.16.