OOMProf: Profiling on the Brink
17 days ago
- #Linux
- #Debugging
- #Memory Management
- Introduction to the Linux OOM killer and its challenges in debugging memory issues.
- Development of OOMProf, an eBPF-based monitoring system for profiling Go programs during OOM kills.
- Challenges with OOM kills include lack of context, difficulty in identifying the root cause, and rapid system unraveling.
- Explanation of Linux's overcommit behavior and its implications for memory allocation failures.
- Techniques for diagnosing OOM issues, including heap growth tracing and turning off overcommit.
- Complications in garbage-collected languages like Go, where heap profiles may be outdated.
- Solution using eBPF to profile memory at the moment of OOM kill by leveraging tracepoints.
- Potential issues with eBPF memory reading, instruction limits, and handling large numbers of memory buckets.
- Usage of OOMProf in the Parca Agent for continuous monitoring and automatic profile uploads.
- Future plans include support for other allocators (jemalloc, tcmalloc, mimalloc) and additional diagnostics like stack dumps.