GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive

3 hours ago

GLM-OCR is a multimodal OCR model for complex document understanding built on the GLM-V encoder–decoder architecture.
It achieves state-of-the-art performance with a score of 94.62 on OmniDocBench V1.5 and excels in benchmarks like formula and table recognition.
The model is optimized for real-world scenarios, handling complex tables, code-heavy documents, and seals efficiently.
With only 0.9B parameters, it supports efficient inference via vLLM, SGLang, and Ollama, reducing latency and cost.
GLM-OCR is fully open-sourced with an easy-to-use SDK, offering one-line invocation and smooth integration into production pipelines.
Users can deploy it via a cloud API without a GPU or self-host locally for full control using tools like vLLM or SGLang.
The SDK includes a Skill mode for agent-friendly usage and provides comprehensive configuration options via CLI, Python API, or YAML files.

Hasty Briefsbeta