Performance Improvements in Libffi

8 hours ago

#runtime function calls
#performance tuning
#libffi optimization

libffi is a runtime function call interpreter that determines how to place arguments and make calls based on signature descriptions provided at runtime, without pre-compilation.
Unlike JIT compilation, which generates executable code at runtime (posing security risks), libffi intentionally remains an interpreter to avoid creating writable and executable memory pages.
A key inefficiency in libffi is that ffi_prep_cif computes argument placement but discards it, forcing ffi_call to re-derive placement on every call, involving heavy branch and pointer-chasing work.
To optimize, libffi introduces a 'plan'—a precomputed bytecode of moves for each signature. This plan is built once per signature (via ffi_call_plan_alloc) and reused in subsequent calls (via ffi_call_plan_invoke), skipping re-classification.
The plan includes simple opcodes (e.g., GP64 for general registers, SE32 for sign extension). For common cases like pointer-only arguments, a thunk bypasses interpretation entirely, making calls nearly as fast as direct calls.
Benchmarks show ffi_call_plan_invoke reduces overhead from 16x (with ffi_call) to under 3x compared to a direct call, making it about 6x faster than traditional ffi_call for pointer-heavy signatures.
Real-world usage in GNOME Shell shows over 90% of calls are pure 64-bit general-purpose arguments (thunk-eligible), with repetitive signature patterns ideal for plan caching.
This optimization is currently experimental, available only in libffi's git HEAD for x86-64 Linux, with uncertain applicability to other ABIs depending on their calling convention complexity.

Hasty Briefsbeta

Performance Improvements in Libffi