GitHub - datalab-to/chandra: OCR model that handles complex tables, forms, handwriting with full layout.
3 hours ago
- #Document Intelligence
- #OCR
- #Multilingual
- Chandra OCR 2 is a state-of-the-art OCR model that converts images and PDFs into structured HTML, Markdown, or JSON while preserving layout information.
- It supports over 90 languages, excels in handling handwriting, tables, math, forms, and complex layouts, and can extract images with captions.
- The model offers two inference modes: local (HuggingFace) and remote (vLLM server), with a hosted API and free playground available for testing.
- Benchmark results show Chandra 2 performs well in multilingual and general OCR tasks, topping the olmocr benchmark and improving on internal metrics.
- Installation is via pip with options for different backends, and the CLI allows processing single files or directories with various configurable options.
- Output includes Markdown, HTML, and JSON files, along with extracted images and metadata, organized in a structured directory.
- Licensing is Apache 2.0 for code, with model weights under a modified OpenRAIL-M license, free for research, personal use, and startups under certain conditions.
- Performance benchmarks compare Chandra 2 with other models like Datalab API, dots.ocr, and GPT-4o, showing competitive scores in accuracy and speed.