SIMD Within a Register: How I Doubled Hash Table Lookup Performance
9 months ago
- #Performance Optimization
- #Bit Manipulation
- #Cuckoo Filter
- The author implemented a Cuckoo Filter in C# using an 8-bit fingerprint for a 3% false-positive rate.
- Initially, a byte array was used for the hash table, with each bucket containing 4 slots (4 bytes).
- The author experimented with replacing the 4-byte bucket with a 32-bit uint for potential performance gains.
- Two uint-based lookup methods were tested: one using bit shifting and another using BitConverter.
- Bit shifting proved 35% faster than the original byte-array loop, while BitConverter was slower due to Span overhead.
- A branch-free lookup method using XOR and bit-twiddling tricks was introduced, improving performance by over 60% for positive lookups and more than doubling the speed for negative lookups.
- The final implementation uses a bit-twiddling trick to detect zero bytes after XORing the bucket with a mask of the target fingerprint.
- Despite reduced readability, the performance gains justified the use of bit manipulation in the C# implementation.