DeepSeek OCR

16 hours ago

Copy Link

DeepSeek-OCR is released to explore visual-text compression from an LLM-centric viewpoint.
Setup requires CUDA 11.8 and Torch 2.6.0, with installation steps provided for Conda and necessary packages.
VLLM and Transformers configurations are detailed for running the model, including script paths and settings.
The model supports various resolution modes: Tiny (512×512), Small (640×640), Base (1024×1024), Large (1280×1280), and Dynamic (Gundam mode).
Different prompt templates are provided for tasks like document conversion, OCR, image description, and figure parsing.
Acknowledgments are given to various models and benchmarks that contributed to the project.

Hasty Briefsbeta