Hasty Briefsbeta

Bilingual

My LSM tree was slower than a B-tree. Then I profiled it

3 days ago
  • #LSM Tree
  • #Performance Optimization
  • #Database Engineering
  • The author built an LSM tree in Go to understand storage engines like RocksDB, but it was initially slower than a B-tree, with only 250,000 writes per second.
  • B-trees slow down on write-heavy workloads due to random writes, while LSM trees avoid this by writing to an in-memory MemTable first and then performing sequential writes to disk.
  • Profiling revealed bottlenecks: the write syscall (34% CPU), garbage collection and sorting (48% CPU). Fixes included memory-mapping the WAL to reduce syscalls and using a streaming k-way merge to eliminate large allocations and sorting.
  • A bug in the bloom filter sizing caused reads to slow down; fixing it by using a correct entry count restored read performance.
  • Further optimizations included memory-mapping SSTable files for zero-copy reads, incremental CRC computation, buffered writes, and pre-allocating MemTable capacity.
  • The final version achieved 1.98 million sequential writes per second, 5x faster than BoltDB on random writes, but still slower than BadgerDB and BoltDB on reads due to LSM trade-offs.
  • Correctness was ensured through tests including key overwrites, crash recovery, and tombstone propagation, all passing with race detection.
  • The author emphasized that profiling, not guessing, drove all optimizations, and the bottleneck shifted with each fix, highlighting the importance of measurement in performance tuning.