Show HN: We cut RAG latency ~2× by switching embedding model

a day ago

Copy Link

MyClone migrated from OpenAI text-embedding-3-small (1536d) to Voyage-3.5-lite (512d) to achieve 3× storage savings, 2× faster retrieval, and 15-20% reduction in voice latency.
Voyage-3.5-lite leverages Matryoshka Representation Learning (MRL) to maintain retrieval quality despite lower dimensions.
The switch resulted in a ~66% reduction in storage footprint and 50% faster retrieval latency.
End-to-end voice latency improved by 15-20%, and first-token latency dropped by 15%.
Voyage-3.5-lite's flexibility in dimensions and quantization allows for future optimizations.
The migration improved user experience, reduced infrastructure costs, and provided headroom for richer features.

Hasty Briefsbeta