When Compiler Optimizations Hurt Performance
8 days ago
- #UTF-8
- #Performance
- #Benchmarking
- Benchmarking techniques for calculating UTF-8 sequence lengths reveals performance differences.
- Hardware-assisted counting of leading zero bits underperforms compared to a naive method, processing between 438 MB/s and 462 MB/s.
- The compiler emits a lookup table for switch-case optimization, but performance is worse than branching instructions (over 2000 MB/s).
- Disabling jump tables with `-fno-jump-tables` in clang++ improves performance to match the naive method.
- GNU g++ for AArch64 does not emit lookup tables, making `-fno-jump-tables` ineffective.
- Previous research by Julian Squires (2017) discusses similar findings on x86-x64 platforms.