DeepSeek OCR
16 hours ago
- #DeepSeek-OCR
- #LLM-centric
- #visual-text compression
- DeepSeek-OCR is released to explore visual-text compression from an LLM-centric viewpoint.
- Setup requires CUDA 11.8 and Torch 2.6.0, with installation steps provided for Conda and necessary packages.
- VLLM and Transformers configurations are detailed for running the model, including script paths and settings.
- The model supports various resolution modes: Tiny (512×512), Small (640×640), Base (1024×1024), Large (1280×1280), and Dynamic (Gundam mode).
- Different prompt templates are provided for tasks like document conversion, OCR, image description, and figure parsing.
- Acknowledgments are given to various models and benchmarks that contributed to the project.