Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction
3 days ago
- #storage optimization
- #asymmetric quantization
- #late interaction retrieval
- Asymmetric quantization enables near-lossless late interaction retrieval, reducing document storage by 97% (from 393 KiB to 12.28 KiB per document) with minimal quality loss (only a 0.61 NDCG@10 drop).
- Late interaction models like Wholembed v3 enhance retrieval precision but increase storage costs by producing multiple vectors per document. Asymmetric quantization stores document vectors as 1-bit signs while keeping query vectors at higher precision (e.g., int8), optimizing storage and performance.
- The scoring trick uses efficient identity-based computation for int8 x binary scoring, reducing storage and speeding up retrieval. Production benchmarks show a 3.82x speedup vs. fp32 and better storage economics, making late interaction practical at scale.