Tracking down a 25% Regression on LLVM RISC-V
3 days ago
- #Performance Optimization
- #RISC-V
- #LLVM
- An LLVM commit improved isKnownExactCastIntToFP, enabling optimization of fpext(sitofp x to float) to double into uitofp x to double, but inadvertently broke a downstream narrowing optimization in visitFPTrunc.
- This caused a ~24% performance regression on RISC-V targets because fdiv.d (33 cycle latency) was emitted instead of fdiv.s (19 cycle latency).
- The fix extended getMinimumFPType with range analysis to recognize fptrunc(uitofp x double) to float can be reduced to uitofp x to float, restoring the narrowing optimization.
- Analysis identified the regression in a benchmark where LLVM used fdiv.d in a loop, while GCC used fdiv.s, leading to increased cycles.
- The issue was traced to a specific commit in InstCombine that changed isKnownExactCastIntToFP behavior, preventing visitFPTrunc from optimizing double to float narrowing.
- A patch was submitted and merged, modifying canBeCastedExactlyIntToFP and getMinimumFPType to handle integer-to-FP casts with fptrunc, resulting in optimized float operations and performance recovery.