The case of the UI thread that hung in a kernel call
a year ago
- #debugging
- #thread-suspension
- #deadlock
- A customer reported a UI thread hang that couldn't be diagnosed due to the stack being paged out.
- The thread was suspended for over five hours, but no debugger was attached to explain the suspension.
- A watchdog thread within the same process was found to suspend the UI thread to capture stack traces, leading to a deadlock.
- The deadlock occurred because the UI thread held a lock needed by the watchdog thread to capture the stack trace.
- Suspending threads within the same process risks deadlock if the suspended thread holds resources needed by others.
- The solution is to use an external process for watchdog functionality to avoid deadlocks.
- The kernel delays thread suspension to avoid interrupting critical operations, but this doesn't prevent user-mode deadlocks.
- Microsoft's design choices around thread suspension and loader locks were criticized, but the root issue was the in-process watchdog.