GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.
4 hours ago
- #llm-integration
- #markdown-conversion
- #python-tool
- MarkItDown is a Python utility for converting various file formats to Markdown, optimized for LLM use and text analysis pipelines.
- Version 0.1.0 introduces breaking changes, including optional feature-group dependencies and updates to the convert_stream() method.
- Installation can be done via pip with optional dependencies for specific formats, or from source, and supports Python 3.10+.
- It supports conversion from formats like PowerPoint, Word, Excel, images, audio, HTML, ZIP, YouTube URLs, and EPubs.
- Features include optional plugins for extended functionality, such as OCR support via the markitdown-ocr plugin.
- Integration options include Microsoft Document Intelligence for PDF conversion and LLM-based image descriptions.
- Can be used via CLI, Python API, or Docker, and includes instructions for contributing, testing, and community involvement.