Elasticsearch's BBQ vs. TurboQuant: 10–40× faster on CPU and lower ranking noise
11 hours ago
- #quantization
- #vector-search
- #performance
- Elasticsearch provides extensive developer tools including vector search and REST APIs.
- Elasticsearch's Optimized Scalar Quantization (OSQ) outperforms TurboQuant in CPU vector search throughput, ranking accuracy, and storage efficiency.
- Scalar quantization compresses embedding vectors to small integers to reduce storage and speed up scoring.
- OSQ uses uniform grid quantization with features like centering and anisotropic loss to improve accuracy.
- TurboQuant uses a Hadamard rotation and non-uniform centroids, focusing on optimal MSE but with computational tradeoffs.
- In tests, OSQ's symmetric kernels are 10-40x faster than TurboQuant, especially on Apple M2 Max.
- OSQ's block-diagonal preconditioner matches Hadamard benefits without padding overhead.
- On dot-product accuracy, OSQ excels in small-angle scenarios and with shifted data due to centering.
- TurboQuant's throughput is limited by data-dependent gather operations compared to OSQ's integer arithmetic.
- For CPU-based search, OSQ is superior in throughput, ranking accuracy, and storage efficiency.