Hasty Briefsbeta

A kernel bug froze my machine: Debugging an async-profiler deadlock

4 days ago
  • #Performance Profiling
  • #Debugging
  • #Linux Kernel
  • A Linux user encountered a machine freeze when using async-profiler with QuestDB, traced to a kernel bug in version 6.17.
  • The bug was a deadlock in the perf_events subsystem, specifically in the hrtimer_cancel function, causing the system to freeze.
  • A workaround was found by starting the profiler with the -e ctimer option, avoiding the problematic kernel feature.
  • The user set up QEMU to debug the kernel, reproducing the freeze and using GDB to inspect the deadlock in real-time.
  • The debugging revealed that the deadlock occurred when hrtimer_cancel was called from within its own callback, leading to an infinite loop.
  • Attempts to manually fix the deadlock by altering return values in GDB were partially successful but not reliable for production use.
  • The conclusion was to use the workaround or wait for a kernel fix, with insights gained into kernel debugging and perf_events.