Hasty Briefsbeta

When Compiler Optimizations Hurt Performance

8 days ago
  • #UTF-8
  • #Performance
  • #Benchmarking
  • Benchmarking techniques for calculating UTF-8 sequence lengths reveals performance differences.
  • Hardware-assisted counting of leading zero bits underperforms compared to a naive method, processing between 438 MB/s and 462 MB/s.
  • The compiler emits a lookup table for switch-case optimization, but performance is worse than branching instructions (over 2000 MB/s).
  • Disabling jump tables with `-fno-jump-tables` in clang++ improves performance to match the naive method.
  • GNU g++ for AArch64 does not emit lookup tables, making `-fno-jump-tables` ineffective.
  • Previous research by Julian Squires (2017) discusses similar findings on x86-x64 platforms.