Hasty Briefsbeta

Faster Index I/O with NVMe SSDs

8 days ago
  • #NVMe-SSD
  • #performance-optimization
  • #search-engine
  • The Marginalia Search index has been partially rewritten for better performance using new data structures optimized for modern hardware.
  • The index has grown from 350 million to 800 million documents due to relaxed filtering conditions and the incorporation of a new advertisement detection algorithm.
  • Future plans include indexing results in additional languages, which is expected to further increase the index size.
  • The new design uses deterministic block-based skip lists instead of B-trees, improving efficiency for sorted list intersection tasks.
  • Performance benchmarks show significant improvements with larger block sizes, with 128 KB blocks offering a good balance between speed and efficiency.
  • NVMe SSDs exhibit unique performance characteristics, where read times are largely independent of block size up to very large sizes.
  • Optimizations such as data locality improvements and the use of io_uring for concurrent reads have been implemented to reduce latency and increase throughput.
  • The search engine's performance is now more consistent, with a focus on managing I/O contention and queue depths to maintain low latency.
  • Further optimizations may include better compression algorithms for positions data and revisiting bloom filter intersection techniques.
  • The new index design is configurable to run efficiently on a variety of hardware, from NVMe SSDs to SATA SSDs.