Haydex: From Zero to 178,600M rows a second in 30 days

18 hours ago

Copy Link

Haydex V1 was developed to solve slow needle-in-the-haystack queries by using probabilistic filters.
Key optimizations included shifting from block-scoped to field-scoped filters, reducing I/O overhead significantly.
Profiler-driven optimizations identified and fixed memory and CPU bottlenecks, leading to massive performance gains.
A distributed map-reduce architecture was implemented to handle large-scale indexing, reducing latency from minutes to seconds.
Compounding optimizations, including lazy loading and efficient pruning, resulted in peak throughput of 673 billion rows per second.
Lessons learned: I/O is critical, profiler insights are invaluable, performance optimization is iterative, and speed comes at a cost.
Future improvements include tiered indexing, a janitor service for cleanup, and further query optimizations.

Hasty Briefsbeta