It's Not Always ICache

8 days ago

Copy Link

The post discusses the impact of inlining optimizations in Rust, questioning the common belief that performance degradation is due to instruction cache (ICache) misses.
The author conducts a benchmark using the `once_cell` Rust library to observe the effects of inlining, specifically focusing on the `initialize` function.
Results show that `#[inline(never)]` is measurably faster than `#[inline(always)]`, but the difference is not clearly linked to ICache misses.
Tools like `perf stat` and `cachegrind` are used to analyze performance counters and instruction counts, revealing that inlining increases the number of instructions executed.
Assembly code analysis shows that inlining leads to larger function prologues and epilogues, more register usage, and potential loop pessimizations, rather than ICache issues.
The post concludes that inlining can slow down code due to increased instructions and register usage, not necessarily ICache misses, and recommends using tools like `perf` and `valgrind` for accurate analysis.

Hasty Briefsbeta