Faster Index I/O with NVMe SSDs
8 days ago
- #NVMe-SSD
- #performance-optimization
- #search-engine
- The Marginalia Search index has been partially rewritten for better performance using new data structures optimized for modern hardware.
- The index has grown from 350 million to 800 million documents due to relaxed filtering conditions and the incorporation of a new advertisement detection algorithm.
- Future plans include indexing results in additional languages, which is expected to further increase the index size.
- The new design uses deterministic block-based skip lists instead of B-trees, improving efficiency for sorted list intersection tasks.
- Performance benchmarks show significant improvements with larger block sizes, with 128 KB blocks offering a good balance between speed and efficiency.
- NVMe SSDs exhibit unique performance characteristics, where read times are largely independent of block size up to very large sizes.
- Optimizations such as data locality improvements and the use of io_uring for concurrent reads have been implemented to reduce latency and increase throughput.
- The search engine's performance is now more consistent, with a focus on managing I/O contention and queue depths to maintain low latency.
- Further optimizations may include better compression algorithms for positions data and revisiting bloom filter intersection techniques.
- The new index design is configurable to run efficiently on a variety of hardware, from NVMe SSDs to SATA SSDs.