Show HN: Built a tool solve the nightmare of chunking tables in PDF vs. Markdown
a day ago
- #AI
- #DataPrivacy
- #RAG
- Stop using static chunk sizes for RAG pipelines.
- Introduces a lightweight, production-ready RAG ingestion toolkit with smart heuristics for optimal chunking.
- Part of a larger, private-by-design AI platform focused on data privacy and running on your own hardware.
- Addresses limitations of static chunking with complex documents like PDFs, source code, and structured Markdown.
- Features layout-aware parsing using Docling to understand document structure.
- Implements smart chunking heuristics tailored to different file types.
- Production-ready and lightweight with no complex dependencies.
- Preserves table structure by converting PDF tables to Markdown before chunking.
- Future plans include making the toolkit pip-installable.
- Open-source project welcoming ideas and contributions.