Searching for the cause of hung tasks in the Linux kernel
a year ago
- #Kernel
- #Linux
- #Debugging
- Linux kernel produces hung task warnings when processes are stuck in an uninterruptible state (TASK_UNINTERRUPTIBLE) for too long.
- The hung task warning is triggered by the khungtaskd daemon, which checks processes in the D state (TASK_UNINTERRUPTIBLE).
- Processes in the D state cannot be interrupted by signals and may indicate system resource issues or overwhelmed subsystems.
- The TASK_KILLABLE state was introduced to allow termination with fatal signals while still protecting process memory.
- Example #1: XFS file system slowdown caused hung task warnings due to missing no_read_workqueue/no_write_workqueue flags.
- Example #2: A process generating a core dump can trigger hung task warnings as the kernel preserves memory in the D state.
- Example #3: The rtnl_mutex lock in the kernel networking subsystem caused multiple processes to hang, identified via BPF tracing.
- Debugging hung tasks involves analyzing stack traces, checking system metrics, and using tools like bpftrace and drgn.
- Hung task warnings are valuable for identifying system issues but may point to victims rather than the root cause.