Tuning LLVM's SLP Vectorizer Cost Model
7 hours ago
- #Performance Optimization
- #LLVM
- #SLP Vectorizer
- Identified a performance regression in LLVM's SLP vectorizer on a RISC-V target where a benchmark showed a 89% delta in performance, with increased instructions and cycles.
- Analyzed assembly to find that new code generation introduced expensive fsd (Float Store Double) instructions and vfredosum.vs vector reductions, replacing scalar fadd chains, causing slowdowns.
- Used LLVM IR to trace regression to a middle-end pass, pinpointed commit '230980947083 [SLP] Support ordered fadd reduction via reduction intrinsics' as the culprit.
- Investigated SLP vectorizer's cost model: found that TreeCost (cost to build vectors) was not scaled by loop iterations, while ReductionCost was scaled, leading to misleading negative total cost.
- Fixed the issue by passing RdxRoot (reduction root instruction) to cost calculation functions, enabling correct scaling of TreeCost with loop iterations, restoring baseline performance.