Production RAG: what I learned from processing 5M+ documents

7 hours ago

Copy Link

Started with Langchain and Llamaindex for RAG, achieving a quick prototype but faced subpar results in production.
Key improvements included Query Generation for broader context coverage, Reranking for better chunk relevance, and optimized Chunking Strategy.
Enhanced LLM responses by including Metadata like title and author, and implemented Query Routing for non-RAG questions.
Tech stack evolved from Azure to Pinecone to Turbopuffer for vector database, with custom solutions for document extraction and chunking.
Open-sourced the project as agentset-ai/agentset under MIT license, sharing learnings and solutions.

Hasty Briefsbeta