Hasty Briefsbeta

From profiling to kernel patch: the journey to an eBPF performance fix

6 days ago
  • #Performance Optimization
  • #Linux Kernel
  • #eBPF
  • Superluminal, a CPU profiler for Linux, uses eBPF to capture performance data, leading to a kernel change for faster eBPF map-in-map updates.
  • eBPF (extended Berkeley Packet Filter) allows running custom programs in the Linux kernel, used by Superluminal for collecting performance data like context switches.
  • eBPF maps facilitate data exchange between kernel and userspace, with Superluminal using ring buffers for performance events and array-of-maps for unwind data.
  • Performance issues arose during precaching of unwind data due to slow bpf_map_update_elem calls, caused by synchronize_rcu waits in the kernel.
  • The root cause was identified as an unnecessary global synchronization point in map-in-map updates, leading to serialized performance.
  • A solution was proposed to use synchronize_rcu_expedited instead, reducing precache time from ~830ms to ~26ms (31x faster).
  • The change was submitted as a patch and accepted for Linux kernel 6.19, benefiting all eBPF map-in-map users.
  • The discovery highlights the importance of off-cpu profiling, often overlooked by traditional sampling profilers like perf.