Hasty Briefsbeta

Show HN: Built a tool solve the nightmare of chunking tables in PDF vs. Markdown

a day ago
  • #AI
  • #DataPrivacy
  • #RAG
  • Stop using static chunk sizes for RAG pipelines.
  • Introduces a lightweight, production-ready RAG ingestion toolkit with smart heuristics for optimal chunking.
  • Part of a larger, private-by-design AI platform focused on data privacy and running on your own hardware.
  • Addresses limitations of static chunking with complex documents like PDFs, source code, and structured Markdown.
  • Features layout-aware parsing using Docling to understand document structure.
  • Implements smart chunking heuristics tailored to different file types.
  • Production-ready and lightweight with no complex dependencies.
  • Preserves table structure by converting PDF tables to Markdown before chunking.
  • Future plans include making the toolkit pip-installable.
  • Open-source project welcoming ideas and contributions.