Hasty Briefsbeta

Bilingual

Rolling your own serverless OCR in 40 lines of code

3 months ago
  • #Serverless
  • #OCR
  • #Machine Learning
  • Author needed to make 'Bayesian Data Analysis' searchable for a statistics agent.
  • Existing OCR tools were either limited or expensive for processing thousands of pages.
  • DeepSeek's open OCR model was chosen for its good handling of mathematical notation.
  • Modal, a serverless compute platform, was used to run the OCR model on cloud GPUs.
  • The solution involved deploying a FastAPI server on Modal to process images into markdown text.
  • Batched inference was implemented to process multiple pages simultaneously, improving efficiency.
  • The OCR output was cleaned to remove unnecessary grounding tags, focusing on the text content.
  • The entire process for a 600-page book took about 45 minutes on an A100 GPU, costing around $2.
  • The resulting searchable text allows for easy grepping, explanation by AI, and building search indexes.
  • The quality of OCR on mathematical content was noted to be surprisingly good.