Hamming Distance for Hybrid Search in SQLite
10 days ago
- #Hybrid Search
- #Semantic Search
- #SQLite
- Implementation of semantic search in SQLite using binary embeddings and Hamming distance for hybrid search.
- Binary embeddings reduce storage by quantizing each dimension to a single bit, changing similarity metric to Hamming distance.
- Hamming distance counts differing bit positions between binary vectors, computed efficiently using XOR and popcount operations.
- SQLite extension created to register a custom Hamming distance function, handling BLOB inputs and returning integer results.
- Performance testing showed 35ms for 1 million rows, including sorting, with 28ms for computation alone.
- Limitations include O(n) complexity without indexing and the need for virtual table support for true top-k selection.
- Hybrid search combines FTS5 BM25 ranking with semantic search using Reciprocal Rank Fusion (RRF) for merging relevance signals.
- Example queries demonstrated the benefits of hybrid search in disambiguating and combining keyword and semantic understanding.
- Suitable for small datasets or infrequent queries where O(n) latency is acceptable and external dependencies are to be avoided.