Hasty Briefsbeta

Bilingual

From zero to a RAG system: successes and failures

2 days ago
  • #LLM
  • #RAG
  • #ChromaDB
  • The author was tasked with creating an internal RAG system for engineers, requiring fast responses and access to a decade's worth of projects, including OrcaFlex files.
  • Initial challenges included selecting the right technology stack (Ollama for local LLM, nomic-embed-text for embeddings, LlamaIndex for RAG orchestration, and Python for development).
  • Document chaos was a major issue, with 1 TB of mixed files. A filtering system was implemented to exclude non-text files, reducing the file count by 54%.
  • Indexing 451GB of documents was problematic until switching to ChromaDB, which allowed batch processing and checkpointing, solving memory and corruption issues.
  • GPU limitations were addressed by renting a virtual machine with an NVIDIA RTX 4000 SFF Ada, completing indexing in weeks.
  • The final architecture included Flask for the API, Streamlit for the frontend, and Azure Blob Storage for document serving, with ChromaDB for vector storage.
  • Lessons learned: manage memory with batch processing, implement error tolerance for problematic files, use checkpoints, and add monitoring.
  • The system is now fast, reliable, and useful, though improvements like OrcaFlex integration were not feasible due to resource constraints.