Modern Perfect Hashing
6 months ago
- #perfect-hashing
- #string-processing
- #optimization
- Modern perfect hashing for strings aims to map a fixed set of strings to predefined integers efficiently.
- The approach involves splitting strings by length to optimize bounds checking and memcmp with SIMD.
- PEXT (bit extraction) was considered but discarded due to table size issues and lack of support on Arm and some x86 CPUs.
- A solution inspired by computer chess uses magic numbers to convert sparse bits into a collision-free table.
- Example provided for length-4 CSS keywords using a magic number to index into a table without collisions.
- For cases with few values, simple memcmp is sufficient without needing complex hashing.
- Finding magic numbers involves trial and error, accelerated by the killer heuristic to quickly discard bad candidates.
- When no magic number is found, splitting the group based on character checks is suggested, though performance varies.
- The final implementation was twice as fast as gperf with half the code size, but the search process is slow.