From zero to a RAG system: successes and failures

2 days ago

The author was tasked with creating an internal RAG system for engineers, requiring fast responses and access to a decade's worth of projects, including OrcaFlex files.
Initial challenges included selecting the right technology stack (Ollama for local LLM, nomic-embed-text for embeddings, LlamaIndex for RAG orchestration, and Python for development).
Document chaos was a major issue, with 1 TB of mixed files. A filtering system was implemented to exclude non-text files, reducing the file count by 54%.
Indexing 451GB of documents was problematic until switching to ChromaDB, which allowed batch processing and checkpointing, solving memory and corruption issues.
GPU limitations were addressed by renting a virtual machine with an NVIDIA RTX 4000 SFF Ada, completing indexing in weeks.
The final architecture included Flask for the API, Streamlit for the frontend, and Azure Blob Storage for document serving, with ChromaDB for vector storage.
Lessons learned: manage memory with batch processing, implement error tolerance for problematic files, use checkpoints, and add monitoring.
The system is now fast, reliable, and useful, though improvements like OrcaFlex integration were not feasible due to resource constraints.

Hasty Briefsbeta