Show HN: RAG-chunk – A CLI to test RAG chunking strategies
7 days ago
- #RAG
- #Markdown
- #CLI
- CLI tool for parsing, chunking, and evaluating Markdown documents for RAG preparation.
- Available on PyPI: https://pypi.org/project/rag-chunk/.
- Features include parsing and cleaning Markdown files, three chunking strategies (fixed-size, sliding-window, paragraph), recall-based evaluation, and multiple output formats (table/JSON/CSV).
- Installation via pip: `pip install rag-chunk` or `pip install -e .` for development mode.
- Supports analysis of markdown files with various strategies and parameters.
- Evaluation based on a test JSON file with questions and relevant phrases to measure recall.
- Future plans include support for tiktoken for precise token-based chunking, more strategies, and additional file types.
- Project structure includes src, tests, examples, and .chunks directories.
- MIT licensed.