Tuning LLVM's SLP Vectorizer Cost Model

5 hours ago

Identified a performance regression in LLVM's SLP vectorizer on a RISC-V target where a benchmark showed a 89% delta in performance, with increased instructions and cycles.
Analyzed assembly to find that new code generation introduced expensive fsd (Float Store Double) instructions and vfredosum.vs vector reductions, replacing scalar fadd chains, causing slowdowns.
Used LLVM IR to trace regression to a middle-end pass, pinpointed commit '230980947083 [SLP] Support ordered fadd reduction via reduction intrinsics' as the culprit.
Investigated SLP vectorizer's cost model: found that TreeCost (cost to build vectors) was not scaled by loop iterations, while ReductionCost was scaled, leading to misleading negative total cost.
Fixed the issue by passing RdxRoot (reduction root instruction) to cost calculation functions, enabling correct scaling of TreeCost with loop iterations, restoring baseline performance.

Hasty Briefsbeta