Why frontier LLMs can't read the hard documents without experts involved
8 hours ago
- #Agentic Workflows
- #Document AI
- #OCR Pricing
- In June 2026, the cost of machine document reading dropped dramatically, with Gemini Flash extraction at about $0.17 per 1,000 pages, making legacy solutions like AWS Textract and Google Document AI less competitive.
- Major cloud platforms are shifting to general models; Google Document AI now uses Gemini models, deprecating older processors, signaling the end of purpose-built document processing layers.
- Independent benchmarks show Gemini models leading in key information extraction and OCR for clean documents, but struggle with handwriting, sparse tables, and chart analysis, capping accuracy at 75.5% or below for hard documents.
- Model labs like OpenAI and Anthropic are moving beyond extraction to selling agentic workflows for knowledge work, automating tasks like tax form processing and financial document handling with high accuracy claims.
- Document format standardization accelerated, with DocLang becoming an open standard in June, and open-source models like Mistral OCR 4 and Baidu Unlimited-OCR commoditizing the model layer, focusing on trust and verifiability.
- Specialist providers still outperform frontier models on hard tasks and cost-adjusted accuracy; Nanonets leads in some areas, showing that architecture and integration issues persist beyond model capabilities.
- The market is bifurcating: value moves up into agentic orchestration (e.g., Coupa's acquisitions) and down into raw capacity (e.g., Daida's scanning facility), while pure-play vendors like Hyperscience survive by focusing on complex use cases.
- Buyers should test on worst documents, price the easy half honestly with direct model calls, consider who owns the agentic layer, and invest in capabilities that handle model weaknesses like validation and exception queues.