Hasty Briefsbeta

Show HN: We cut RAG latency ~2× by switching embedding model

a day ago
  • #Embeddings
  • #RAG
  • #Latency
  • MyClone migrated from OpenAI text-embedding-3-small (1536d) to Voyage-3.5-lite (512d) to achieve 3× storage savings, 2× faster retrieval, and 15-20% reduction in voice latency.
  • Voyage-3.5-lite leverages Matryoshka Representation Learning (MRL) to maintain retrieval quality despite lower dimensions.
  • The switch resulted in a ~66% reduction in storage footprint and 50% faster retrieval latency.
  • End-to-end voice latency improved by 15-20%, and first-token latency dropped by 15%.
  • Voyage-3.5-lite's flexibility in dimensions and quantization allows for future optimizations.
  • The migration improved user experience, reduced infrastructure costs, and provided headroom for richer features.