Hasty Briefsbeta

Bilingual

When O3 is 2x slower than O2

6 months ago
  • #Optimization
  • #Rust
  • #Performance
  • The article discusses a performance regression when optimizing a custom bounded priority queue in Rust, specifically when changing the optimization level from `opt-level=2` to `opt-level=3`.
  • Benchmark results show a +123% performance penalty with `opt-level=3` compared to `opt-level=2`.
  • The queue uses a sorted `Vec` with binary search for insertion, and the performance issue is traced back to the compare function's behavior under different optimization levels.
  • Flamegraph analysis reveals that `binary_search_by` and the compare function consume significantly more CPU time with `opt-level=3`.
  • Assembly code inspection shows that `opt-level=3` replaces conditional jumps with conditional moves, which theoretically should be faster but in practice leads to worse performance due to dependency bottlenecks.
  • Alternative compare functions, such as using `f32::total_cmp`, are tested but still result in slower performance with `opt-level=3`.
  • Experiments with modifying `binary_search_by` (e.g., adding `hint::black_box` or replacing `hint::select_unpredictable`) yield mixed results, sometimes improving performance but not consistently.
  • Historical context from Rust's issue tracker indicates that conditional moves were introduced to improve performance in certain cases, but they can regress performance in others.
  • The article concludes that benchmarking is complex and performance outcomes can vary widely depending on the specific use case and optimization strategies.