Tracking down a 25% Regression on LLVM RISC-V

3 days ago

An LLVM commit improved isKnownExactCastIntToFP, enabling optimization of fpext(sitofp x to float) to double into uitofp x to double, but inadvertently broke a downstream narrowing optimization in visitFPTrunc.
This caused a ~24% performance regression on RISC-V targets because fdiv.d (33 cycle latency) was emitted instead of fdiv.s (19 cycle latency).
The fix extended getMinimumFPType with range analysis to recognize fptrunc(uitofp x double) to float can be reduced to uitofp x to float, restoring the narrowing optimization.
Analysis identified the regression in a benchmark where LLVM used fdiv.d in a loop, while GCC used fdiv.s, leading to increased cycles.
The issue was traced to a specific commit in InstCombine that changed isKnownExactCastIntToFP behavior, preventing visitFPTrunc from optimizing double to float narrowing.
A patch was submitted and merged, modifying canBeCastedExactlyIntToFP and getMinimumFPType to handle integer-to-FP casts with fptrunc, resulting in optimized float operations and performance recovery.

Hasty Briefsbeta