Hasty Briefsbeta

Bilingual

Tuning LLVM's SLP Vectorizer Cost Model

5 hours ago
  • #Performance Optimization
  • #LLVM
  • #SLP Vectorizer
  • Identified a performance regression in LLVM's SLP vectorizer on a RISC-V target where a benchmark showed a 89% delta in performance, with increased instructions and cycles.
  • Analyzed assembly to find that new code generation introduced expensive fsd (Float Store Double) instructions and vfredosum.vs vector reductions, replacing scalar fadd chains, causing slowdowns.
  • Used LLVM IR to trace regression to a middle-end pass, pinpointed commit '230980947083 [SLP] Support ordered fadd reduction via reduction intrinsics' as the culprit.
  • Investigated SLP vectorizer's cost model: found that TreeCost (cost to build vectors) was not scaled by loop iterations, while ReductionCost was scaled, leading to misleading negative total cost.
  • Fixed the issue by passing RdxRoot (reduction root instruction) to cost calculation functions, enabling correct scaling of TreeCost with loop iterations, restoring baseline performance.