A tale about fixing eBPF spinlock issues in the Linux kernel
6 hours ago
- #eBPF
- #Spinlocks
- #Linux Kernel
- Superluminal, a CPU profiler, encountered periodic system freezes on Linux during captures.
- The issue was traced to a complex interaction between eBPF programs handling sampling and context switch events.
- Debugging revealed a recursive spinlock acquisition scenario involving non-maskable interrupts (NMIs).
- The resilient queued spinlock (rqspinlock) in the Linux kernel had a race condition during deadlock detection.
- Fixes involved reordering lock acquisition steps and improving deadlock checks to handle NMIs correctly.
- The issue was specific to newer kernels (6.15+) due to the introduction of rqspinlock in eBPF ring buffers.
- Patches were backported to kernels 6.17 and 6.18, with a workaround implemented for older kernels.