Hasty Briefsbeta

So you wanna build a local RAG?

13 days ago
  • #Privacy
  • #Self-Hosting
  • #RAG
  • Skald was designed to be self-hostable and operate without sending data to third-parties, addressing privacy concerns for organizations.
  • A basic RAG setup includes a vector database, vector embeddings model, and an LLM, with optional components like a reranker and document parsing.
  • Proprietary and open-source alternatives for each RAG component are provided, emphasizing flexibility in building a local setup.
  • Skald's local stack uses Postgres + pgvector for the vector database, Sentence Transformers for embeddings, and allows user-configurable LLMs and rerankers.
  • Benchmarking showed that a fully local setup with GPT-OSS 20B performed well, scoring an average of 8.63, with some limitations in handling non-English queries and aggregating information from multiple documents.
  • Multi-lingual models like bge-m3 and mmarco-mMiniLMv2-L12-H384-v1 improved performance, especially for non-English queries, though challenges remain in information aggregation.
  • Skald aims to further polish the local setup and publish more benchmarks for open-source models, catering to privacy-sensitive and air-gapped infrastructure needs.