Production RAG: what I learned from processing 5M+ documents
7 hours ago
- #Machine Learning
- #RAG
- #AI Development
- Started with Langchain and Llamaindex for RAG, achieving a quick prototype but faced subpar results in production.
- Key improvements included Query Generation for broader context coverage, Reranking for better chunk relevance, and optimized Chunking Strategy.
- Enhanced LLM responses by including Metadata like title and author, and implemented Query Routing for non-RAG questions.
- Tech stack evolved from Azure to Pinecone to Turbopuffer for vector database, with custom solutions for document extraction and chunking.
- Open-sourced the project as agentset-ai/agentset under MIT license, sharing learnings and solutions.