Hasty Briefsbeta

Haydex: From Zero to 178,600M rows a second in 30 days

18 hours ago
  • #performance-optimization
  • #database-engineering
  • #distributed-systems
  • Haydex V1 was developed to solve slow needle-in-the-haystack queries by using probabilistic filters.
  • Key optimizations included shifting from block-scoped to field-scoped filters, reducing I/O overhead significantly.
  • Profiler-driven optimizations identified and fixed memory and CPU bottlenecks, leading to massive performance gains.
  • A distributed map-reduce architecture was implemented to handle large-scale indexing, reducing latency from minutes to seconds.
  • Compounding optimizations, including lazy loading and efficient pruning, resulted in peak throughput of 673 billion rows per second.
  • Lessons learned: I/O is critical, profiler insights are invaluable, performance optimization is iterative, and speed comes at a cost.
  • Future improvements include tiered indexing, a janitor service for cleanup, and further query optimizations.