Pdeathsig is almost never what you want
a year ago
- #Linux Kernel
- #Debugging
- #Performance Optimization
- The author was tasked with optimizing Output Media start latency at Recall.ai.
- Output Media renders customer-supplied web pages into audio and video for bots, using Chromium in a sandboxed environment with Bubblewrap.
- Initial latency was 12 seconds due to Chromium's resource-intensive startup when Output Media was activated.
- The plan was to pre-load Chromium at bot startup to reduce latency.
- Testing revealed Chromium terminated unexpectedly, despite the parent process still running.
- Debugging identified the issue with Bubblewrap's --die-with-parent flag, which uses Linux's PR_SET_PDEATHSIG.
- PR_SET_PDEATHSIG tracks the parent thread, not the process, causing issues with Tokio's thread management.
- Tokio's dynamic thread parking and reaping led to premature termination of Chromium.
- Removing --die-with-parent resolved the issue, reducing latency to 2-3 seconds.
- The solution improved customer experience by significantly cutting wait times.