Hasty Briefsbeta

Production RAG: what I learned from processing 5M+ documents

7 hours ago
  • #Machine Learning
  • #RAG
  • #AI Development
  • Started with Langchain and Llamaindex for RAG, achieving a quick prototype but faced subpar results in production.
  • Key improvements included Query Generation for broader context coverage, Reranking for better chunk relevance, and optimized Chunking Strategy.
  • Enhanced LLM responses by including Metadata like title and author, and implemented Query Routing for non-RAG questions.
  • Tech stack evolved from Azure to Pinecone to Turbopuffer for vector database, with custom solutions for document extraction and chunking.
  • Open-sourced the project as agentset-ai/agentset under MIT license, sharing learnings and solutions.