Show HN: SIMD-Optimized Bloom Filters in Mojo for Large-Scale Systems
3 days ago
- #data-structures
- #probabilistic
- #optimization
- Bloom filter is a space-efficient probabilistic data structure for set membership tests.
- Key properties: no false negatives, possible false positives, space efficient, fast operations (O(k)), fixed size.
- Common use cases: web crawlers, databases, distributed systems.
- Two Bloom filter variants provided: Standard Bloom Filter and Split Block Bloom Filter (SIMD-optimized).
- Standard Bloom Filter: predictable memory, suitable for smaller datasets.
- Split Block Bloom Filter: cache efficient, designed for large datasets, utilizes SIMD.
- Both variants share the same interface including methods like add, contains, merge, intersect, etc.
- Additional utility methods: bits_set, fill_ratio, estimated_cardinality, current_fpr, should_rotate.
- Project inspired by high-performance Go implementation and research on cache-efficient Bloom filter design.
- Available under BSD 2-Clause License.