Hasty Briefsbeta

The Theoretical Limitations of Embedding-Based Retrieval

12 days ago
  • #Learning Theory
  • #Information Retrieval
  • #Vector Embeddings
  • Vector embeddings are increasingly used for diverse retrieval tasks, including reasoning, instruction-following, and coding.
  • Prior works assume theoretical limitations of embeddings are due to unrealistic queries, solvable with better training data and larger models.
  • This study shows theoretical limitations arise even with simple queries, linking to learning theory results on embedding dimension constraints.
  • Empirical evidence confirms these limitations persist even when optimizing for k=2 with free parameterized embeddings.
  • A new dataset, LIMIT, is introduced to stress-test models, revealing failures in state-of-the-art models on simple tasks.
  • The findings highlight fundamental limitations of single-vector embedding models, calling for new research to overcome these constraints.