Hasty Briefsbeta

Bilingual

GitHub - datalab-to/chandra: OCR model that handles complex tables, forms, handwriting with full layout.

5 hours ago
  • #Document Intelligence
  • #OCR
  • #Multilingual
  • Chandra OCR 2 is a state-of-the-art OCR model that converts images and PDFs into structured HTML, Markdown, or JSON while preserving layout information.
  • It supports over 90 languages, excels in handling handwriting, tables, math, forms, and complex layouts, and can extract images with captions.
  • The model offers two inference modes: local (HuggingFace) and remote (vLLM server), with a hosted API and free playground available for testing.
  • Benchmark results show Chandra 2 performs well in multilingual and general OCR tasks, topping the olmocr benchmark and improving on internal metrics.
  • Installation is via pip with options for different backends, and the CLI allows processing single files or directories with various configurable options.
  • Output includes Markdown, HTML, and JSON files, along with extracted images and metadata, organized in a structured directory.
  • Licensing is Apache 2.0 for code, with model weights under a modified OpenRAIL-M license, free for research, personal use, and startups under certain conditions.
  • Performance benchmarks compare Chandra 2 with other models like Datalab API, dots.ocr, and GPT-4o, showing competitive scores in accuracy and speed.