It's Not Always ICache
8 days ago
- #Optimization
- #Rust
- #Performance
- The post discusses the impact of inlining optimizations in Rust, questioning the common belief that performance degradation is due to instruction cache (ICache) misses.
- The author conducts a benchmark using the `once_cell` Rust library to observe the effects of inlining, specifically focusing on the `initialize` function.
- Results show that `#[inline(never)]` is measurably faster than `#[inline(always)]`, but the difference is not clearly linked to ICache misses.
- Tools like `perf stat` and `cachegrind` are used to analyze performance counters and instruction counts, revealing that inlining increases the number of instructions executed.
- Assembly code analysis shows that inlining leads to larger function prologues and epilogues, more register usage, and potential loop pessimizations, rather than ICache issues.
- The post concludes that inlining can slow down code due to increased instructions and register usage, not necessarily ICache misses, and recommends using tools like `perf` and `valgrind` for accurate analysis.