Hasty Briefsbeta

Bilingual

GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive

5 hours ago
  • #Multimodal AI
  • #Document Understanding
  • #OCR
  • GLM-OCR is a multimodal OCR model for complex document understanding built on the GLM-V encoder–decoder architecture.
  • It achieves state-of-the-art performance with a score of 94.62 on OmniDocBench V1.5 and excels in benchmarks like formula and table recognition.
  • The model is optimized for real-world scenarios, handling complex tables, code-heavy documents, and seals efficiently.
  • With only 0.9B parameters, it supports efficient inference via vLLM, SGLang, and Ollama, reducing latency and cost.
  • GLM-OCR is fully open-sourced with an easy-to-use SDK, offering one-line invocation and smooth integration into production pipelines.
  • Users can deploy it via a cloud API without a GPU or self-host locally for full control using tools like vLLM or SGLang.
  • The SDK includes a Skill mode for agent-friendly usage and provides comprehensive configuration options via CLI, Python API, or YAML files.