Haydex: From Zero to 178,600M rows a second in 30 days
18 hours ago
- #performance-optimization
- #database-engineering
- #distributed-systems
- Haydex V1 was developed to solve slow needle-in-the-haystack queries by using probabilistic filters.
- Key optimizations included shifting from block-scoped to field-scoped filters, reducing I/O overhead significantly.
- Profiler-driven optimizations identified and fixed memory and CPU bottlenecks, leading to massive performance gains.
- A distributed map-reduce architecture was implemented to handle large-scale indexing, reducing latency from minutes to seconds.
- Compounding optimizations, including lazy loading and efficient pruning, resulted in peak throughput of 673 billion rows per second.
- Lessons learned: I/O is critical, profiler insights are invaluable, performance optimization is iterative, and speed comes at a cost.
- Future improvements include tiered indexing, a janitor service for cleanup, and further query optimizations.