So you wanna build a local RAG?
13 days ago
- #Privacy
- #Self-Hosting
- #RAG
- Skald was designed to be self-hostable and operate without sending data to third-parties, addressing privacy concerns for organizations.
- A basic RAG setup includes a vector database, vector embeddings model, and an LLM, with optional components like a reranker and document parsing.
- Proprietary and open-source alternatives for each RAG component are provided, emphasizing flexibility in building a local setup.
- Skald's local stack uses Postgres + pgvector for the vector database, Sentence Transformers for embeddings, and allows user-configurable LLMs and rerankers.
- Benchmarking showed that a fully local setup with GPT-OSS 20B performed well, scoring an average of 8.63, with some limitations in handling non-English queries and aggregating information from multiple documents.
- Multi-lingual models like bge-m3 and mmarco-mMiniLMv2-L12-H384-v1 improved performance, especially for non-English queries, though challenges remain in information aggregation.
- Skald aims to further polish the local setup and publish more benchmarks for open-source models, catering to privacy-sensitive and air-gapped infrastructure needs.