From profiling to kernel patch: the journey to an eBPF performance fix
6 days ago
- #Performance Optimization
- #Linux Kernel
- #eBPF
- Superluminal, a CPU profiler for Linux, uses eBPF to capture performance data, leading to a kernel change for faster eBPF map-in-map updates.
- eBPF (extended Berkeley Packet Filter) allows running custom programs in the Linux kernel, used by Superluminal for collecting performance data like context switches.
- eBPF maps facilitate data exchange between kernel and userspace, with Superluminal using ring buffers for performance events and array-of-maps for unwind data.
- Performance issues arose during precaching of unwind data due to slow bpf_map_update_elem calls, caused by synchronize_rcu waits in the kernel.
- The root cause was identified as an unnecessary global synchronization point in map-in-map updates, leading to serialized performance.
- A solution was proposed to use synchronize_rcu_expedited instead, reducing precache time from ~830ms to ~26ms (31x faster).
- The change was submitted as a patch and accepted for Linux kernel 6.19, benefiting all eBPF map-in-map users.
- The discovery highlights the importance of off-cpu profiling, often overlooked by traditional sampling profilers like perf.