Hasty Briefsbeta

Bilingual

Modern Perfect Hashing

6 months ago
  • #perfect-hashing
  • #string-processing
  • #optimization
  • Modern perfect hashing for strings aims to map a fixed set of strings to predefined integers efficiently.
  • The approach involves splitting strings by length to optimize bounds checking and memcmp with SIMD.
  • PEXT (bit extraction) was considered but discarded due to table size issues and lack of support on Arm and some x86 CPUs.
  • A solution inspired by computer chess uses magic numbers to convert sparse bits into a collision-free table.
  • Example provided for length-4 CSS keywords using a magic number to index into a table without collisions.
  • For cases with few values, simple memcmp is sufficient without needing complex hashing.
  • Finding magic numbers involves trial and error, accelerated by the killer heuristic to quickly discard bad candidates.
  • When no magic number is found, splitting the group based on character checks is suggested, though performance varies.
  • The final implementation was twice as fast as gperf with half the code size, but the search process is slow.