Hasty Briefsbeta

Show HN: Docustore – Vectorized Technical Documentations

12 days ago
  • #RAG
  • #automation
  • #documentation
  • docustore is an automated pipeline for scraping and processing software documentation into downloadable Knowledge Packs.
  • Knowledge Packs include cleaned, chunked, and embedded documentation for offline RAG applications.
  • Features automated content scraping, intelligent text chunking, vector embeddings, and persistent vector storage.
  • Pipeline stages: scrape, cache, process, and package, with modular and testable design.
  • Setup requires Git, Python 3.9+, and uv for dependency management.
  • Configuration is done via config.toml, including targets and embedding models.
  • Commands include scrape, process, package, and run for pipeline execution.
  • Output is a distributable .tar.gz archive containing a ChromaDB vector store.
  • Example usage involves querying the ChromaDB collection for relevant documentation chunks.
  • Project is licensed under MIT License.