Hunting down a C memory leak in a Go program (2021)
a year ago
- #memory-leak
- #Go
- #eBPF
- Zendesk encountered a memory leak in a Go program using the confluent-kafka-go library, which is built on the C library librdkafka.
- Initial investigation confirmed the leak was in the C part of the program, not Go, by analyzing memory metrics.
- Tools like jemalloc were used to confirm the leak was due to actual memory allocation, not fragmentation or kernel issues.
- Valgrind was employed but failed to identify the leak because the memory was freed before program termination, masking the issue.
- eBPF and bpftrace were used to trace memory allocations and identify leaks dynamically during runtime.
- Custom modifications to librdkafka, including USDT probes, were necessary to enable effective tracing with bpftrace.
- The leak was traced to unhandled OffsetCommitResponse events in librdkafka, leading to an unbounded queue growth.
- A simple fix was implemented by consuming and discarding these events, resolving the memory leak.
- The process provided valuable insights into librdkafka and eBPF tools, enhancing the team's debugging capabilities.