A kernel bug froze my machine: Debugging an async-profiler deadlock
4 days ago
- #Performance Profiling
- #Debugging
- #Linux Kernel
- A Linux user encountered a machine freeze when using async-profiler with QuestDB, traced to a kernel bug in version 6.17.
- The bug was a deadlock in the perf_events subsystem, specifically in the hrtimer_cancel function, causing the system to freeze.
- A workaround was found by starting the profiler with the -e ctimer option, avoiding the problematic kernel feature.
- The user set up QEMU to debug the kernel, reproducing the freeze and using GDB to inspect the deadlock in real-time.
- The debugging revealed that the deadlock occurred when hrtimer_cancel was called from within its own callback, leading to an infinite loop.
- Attempts to manually fix the deadlock by altering return values in GDB were partially successful but not reliable for production use.
- The conclusion was to use the workaround or wait for a kernel fix, with insights gained into kernel debugging and perf_events.