Hasty Briefsbeta

DeepSeek OCR

16 hours ago
  • #DeepSeek-OCR
  • #LLM-centric
  • #visual-text compression
  • DeepSeek-OCR is released to explore visual-text compression from an LLM-centric viewpoint.
  • Setup requires CUDA 11.8 and Torch 2.6.0, with installation steps provided for Conda and necessary packages.
  • VLLM and Transformers configurations are detailed for running the model, including script paths and settings.
  • The model supports various resolution modes: Tiny (512×512), Small (640×640), Base (1024×1024), Large (1280×1280), and Dynamic (Gundam mode).
  • Different prompt templates are provided for tasks like document conversion, OCR, image description, and figure parsing.
  • Acknowledgments are given to various models and benchmarks that contributed to the project.