Show HN: Docustore – Vectorized Technical Documentations
12 days ago
- #RAG
- #automation
- #documentation
- docustore is an automated pipeline for scraping and processing software documentation into downloadable Knowledge Packs.
- Knowledge Packs include cleaned, chunked, and embedded documentation for offline RAG applications.
- Features automated content scraping, intelligent text chunking, vector embeddings, and persistent vector storage.
- Pipeline stages: scrape, cache, process, and package, with modular and testable design.
- Setup requires Git, Python 3.9+, and uv for dependency management.
- Configuration is done via config.toml, including targets and embedding models.
- Commands include scrape, process, package, and run for pipeline execution.
- Output is a distributable .tar.gz archive containing a ChromaDB vector store.
- Example usage involves querying the ChromaDB collection for relevant documentation chunks.
- Project is licensed under MIT License.