From zero to a RAG system: successes and failures
2 days ago
- #LLM
- #RAG
- #ChromaDB
- The author was tasked with creating an internal RAG system for engineers, requiring fast responses and access to a decade's worth of projects, including OrcaFlex files.
- Initial challenges included selecting the right technology stack (Ollama for local LLM, nomic-embed-text for embeddings, LlamaIndex for RAG orchestration, and Python for development).
- Document chaos was a major issue, with 1 TB of mixed files. A filtering system was implemented to exclude non-text files, reducing the file count by 54%.
- Indexing 451GB of documents was problematic until switching to ChromaDB, which allowed batch processing and checkpointing, solving memory and corruption issues.
- GPU limitations were addressed by renting a virtual machine with an NVIDIA RTX 4000 SFF Ada, completing indexing in weeks.
- The final architecture included Flask for the API, Streamlit for the frontend, and Azure Blob Storage for document serving, with ChromaDB for vector storage.
- Lessons learned: manage memory with batch processing, implement error tolerance for problematic files, use checkpoints, and add monitoring.
- The system is now fast, reliable, and useful, though improvements like OrcaFlex integration were not feasible due to resource constraints.