LLMs Corrupt Your Documents When You Delegate
7 hours ago
- #Workflow Reliability
- #AI Delegation
- #Document Corruption
- A study introduces DELEGATE-52 to evaluate AI systems in delegated workflows across 52 domains, finding current LLMs often corrupt documents.
- Even top models like GPT 5.4 corrupt an average of 25% of document content in long workflows, with errors being sparse but severe and compounding.
- Agentic tool use does not enhance performance on DELEGATE-52, and degradation worsens with larger documents, longer interactions, or distractor files.
- The research suggests LLMs are unreliable delegates in knowledge work, highlighting risks in trust for tasks like vibe coding.