The Evolution of 'More Like This'
7 hours ago
- #embeddings
- #search
- #MLT
- More Like This (MLT) enables search for documents similar to an already selected one, useful in various scenarios like article reading, product browsing, or support ticket investigation.
- Traditional MLT was lexical, based on matching important words via techniques like TF-IDF or BM25, effective for exact matches such as error codes, SKUs, or legal wording.
- Embeddings allow MLT to shift to semantic search, comparing vector representations of documents, which captures meaning even with different phrasing, enhancing cross-lingual and conceptual similarity.
- Hybrid search combines lexical and vector approaches, leveraging strengths of both: lexical for precise matches and vector for semantic relationships, with reranking and filters refining results.
- Modern implementations integrate MLT within search engines like Manticore, enabling KNN/ANN searches directly via document IDs, reducing complexity and improving performance in production systems.
- MLT's evolution spans from 2000s lexical methods, through 2010s embeddings like Word2Vec, to recent advances with ANN libraries and RAG, supporting context expansion and personalized retrieval.
- Key considerations for production include exact match requirements, embedding model management, access controls, hybrid search tuning, reranking, and monitoring search quality metrics.