Hasty Briefsbeta

Bilingual

SIMD Within a Register: How I Doubled Hash Table Lookup Performance

9 months ago
  • #Performance Optimization
  • #Bit Manipulation
  • #Cuckoo Filter
  • The author implemented a Cuckoo Filter in C# using an 8-bit fingerprint for a 3% false-positive rate.
  • Initially, a byte array was used for the hash table, with each bucket containing 4 slots (4 bytes).
  • The author experimented with replacing the 4-byte bucket with a 32-bit uint for potential performance gains.
  • Two uint-based lookup methods were tested: one using bit shifting and another using BitConverter.
  • Bit shifting proved 35% faster than the original byte-array loop, while BitConverter was slower due to Span overhead.
  • A branch-free lookup method using XOR and bit-twiddling tricks was introduced, improving performance by over 60% for positive lookups and more than doubling the speed for negative lookups.
  • The final implementation uses a bit-twiddling trick to detect zero bytes after XORing the bucket with a mask of the target fingerprint.
  • Despite reduced readability, the performance gains justified the use of bit manipulation in the C# implementation.