GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive
3 hours ago
- #Multimodal AI
- #Document Understanding
- #OCR
- GLM-OCR is a multimodal OCR model for complex document understanding built on the GLM-V encoder–decoder architecture.
- It achieves state-of-the-art performance with a score of 94.62 on OmniDocBench V1.5 and excels in benchmarks like formula and table recognition.
- The model is optimized for real-world scenarios, handling complex tables, code-heavy documents, and seals efficiently.
- With only 0.9B parameters, it supports efficient inference via vLLM, SGLang, and Ollama, reducing latency and cost.
- GLM-OCR is fully open-sourced with an easy-to-use SDK, offering one-line invocation and smooth integration into production pipelines.
- Users can deploy it via a cloud API without a GPU or self-host locally for full control using tools like vLLM or SGLang.
- The SDK includes a Skill mode for agent-friendly usage and provides comprehensive configuration options via CLI, Python API, or YAML files.