GitHub - datalab-to/chandra: OCR model that handles complex tables, forms, handwriting with full layout.

5 hours ago

Chandra OCR 2 is a state-of-the-art OCR model that converts images and PDFs into structured HTML, Markdown, or JSON while preserving layout information.
It supports over 90 languages, excels in handling handwriting, tables, math, forms, and complex layouts, and can extract images with captions.
The model offers two inference modes: local (HuggingFace) and remote (vLLM server), with a hosted API and free playground available for testing.
Benchmark results show Chandra 2 performs well in multilingual and general OCR tasks, topping the olmocr benchmark and improving on internal metrics.
Installation is via pip with options for different backends, and the CLI allows processing single files or directories with various configurable options.
Output includes Markdown, HTML, and JSON files, along with extracted images and metadata, organized in a structured directory.
Licensing is Apache 2.0 for code, with model weights under a modified OpenRAIL-M license, free for research, personal use, and startups under certain conditions.
Performance benchmarks compare Chandra 2 with other models like Datalab API, dots.ocr, and GPT-4o, showing competitive scores in accuracy and speed.

Hasty Briefsbeta