Hasty Briefsbeta

Lessons from BF-Tree: Building a Concurrent Larger-Than-Memory Index in Rust

2 days ago
  • #database
  • #concurrency
  • #performance
  • BF-Tree introduces variable-size mini-pages (64-4096 bytes) to reduce write amplification and memory waste compared to traditional 4KB page caching.
  • The Rust implementation features a custom ring buffer allocator with a 6-state machine, RAII guards, optimistic locks, and deterministic concurrency testing.
  • Mini-pages serve three functions: buffering writes, caching hot records, and growing for scan workloads, optimizing memory usage based on access patterns.
  • The design decouples in-memory representation from on-disk format, allowing BF-Tree to optimize for the working set rather than physical page layout.
  • BF-Tree's benchmarks show it is 2x faster than B-trees and LSM-trees for point lookups, 6x faster than B-trees for writes, and 2.5x faster than RocksDB for scans.
  • The implementation uses RAII guards (CircularBufferPtr and TombstoneHandle) to manage lifecycle states, ensuring safe transitions and memory reclamation.
  • Concurrency is handled with optimistic versioned locks for inner nodes and custom RwLocks with try_upgrade for page table entries, minimizing contention.
  • Testing leverages Shuttle for systematic concurrency testing, exploring thread interleavings to uncover hard-to-find bugs.
  • Key patterns include RAII guards for state machines, typed errors for retry control flow, and compile-time test infrastructure swaps.