The Theoretical Limitations of Embedding-Based Retrieval
12 days ago
- #Learning Theory
- #Information Retrieval
- #Vector Embeddings
- Vector embeddings are increasingly used for diverse retrieval tasks, including reasoning, instruction-following, and coding.
- Prior works assume theoretical limitations of embeddings are due to unrealistic queries, solvable with better training data and larger models.
- This study shows theoretical limitations arise even with simple queries, linking to learning theory results on embedding dimension constraints.
- Empirical evidence confirms these limitations persist even when optimizing for k=2 with free parameterized embeddings.
- A new dataset, LIMIT, is introduced to stress-test models, revealing failures in state-of-the-art models on simple tasks.
- The findings highlight fundamental limitations of single-vector embedding models, calling for new research to overcome these constraints.