LLMs Corrupt Your Documents When You Delegate

7 hours ago

A study introduces DELEGATE-52 to evaluate AI systems in delegated workflows across 52 domains, finding current LLMs often corrupt documents.
Even top models like GPT 5.4 corrupt an average of 25% of document content in long workflows, with errors being sparse but severe and compounding.
Agentic tool use does not enhance performance on DELEGATE-52, and degradation worsens with larger documents, longer interactions, or distractor files.
The research suggests LLMs are unreliable delegates in knowledge work, highlighting risks in trust for tasks like vibe coding.

Hasty Briefsbeta