Hasty Briefsbeta

Show HN: SIMD-Optimized Bloom Filters in Mojo for Large-Scale Systems

2 days ago
  • #data-structures
  • #probabilistic
  • #optimization
  • Bloom filter is a space-efficient probabilistic data structure for set membership tests.
  • Key properties: no false negatives, possible false positives, space efficient, fast operations (O(k)), fixed size.
  • Common use cases: web crawlers, databases, distributed systems.
  • Two Bloom filter variants provided: Standard Bloom Filter and Split Block Bloom Filter (SIMD-optimized).
  • Standard Bloom Filter: predictable memory, suitable for smaller datasets.
  • Split Block Bloom Filter: cache efficient, designed for large datasets, utilizes SIMD.
  • Both variants share the same interface including methods like add, contains, merge, intersect, etc.
  • Additional utility methods: bits_set, fill_ratio, estimated_cardinality, current_fpr, should_rotate.
  • Project inspired by high-performance Go implementation and research on cache-efficient Bloom filter design.
  • Available under BSD 2-Clause License.