Performance Improvements in Libffi
8 hours ago
- #runtime function calls
- #performance tuning
- #libffi optimization
- libffi is a runtime function call interpreter that determines how to place arguments and make calls based on signature descriptions provided at runtime, without pre-compilation.
- Unlike JIT compilation, which generates executable code at runtime (posing security risks), libffi intentionally remains an interpreter to avoid creating writable and executable memory pages.
- A key inefficiency in libffi is that ffi_prep_cif computes argument placement but discards it, forcing ffi_call to re-derive placement on every call, involving heavy branch and pointer-chasing work.
- To optimize, libffi introduces a 'plan'—a precomputed bytecode of moves for each signature. This plan is built once per signature (via ffi_call_plan_alloc) and reused in subsequent calls (via ffi_call_plan_invoke), skipping re-classification.
- The plan includes simple opcodes (e.g., GP64 for general registers, SE32 for sign extension). For common cases like pointer-only arguments, a thunk bypasses interpretation entirely, making calls nearly as fast as direct calls.
- Benchmarks show ffi_call_plan_invoke reduces overhead from 16x (with ffi_call) to under 3x compared to a direct call, making it about 6x faster than traditional ffi_call for pointer-heavy signatures.
- Real-world usage in GNOME Shell shows over 90% of calls are pure 64-bit general-purpose arguments (thunk-eligible), with repetitive signature patterns ideal for plan caching.
- This optimization is currently experimental, available only in libffi's git HEAD for x86-64 Linux, with uncertain applicability to other ABIs depending on their calling convention complexity.