Hasty Briefsbeta

Bilingual

LLMs Corrupt Your Documents When You Delegate

7 hours ago
  • #Workflow Reliability
  • #AI Delegation
  • #Document Corruption
  • A study introduces DELEGATE-52 to evaluate AI systems in delegated workflows across 52 domains, finding current LLMs often corrupt documents.
  • Even top models like GPT 5.4 corrupt an average of 25% of document content in long workflows, with errors being sparse but severe and compounding.
  • Agentic tool use does not enhance performance on DELEGATE-52, and degradation worsens with larger documents, longer interactions, or distractor files.
  • The research suggests LLMs are unreliable delegates in knowledge work, highlighting risks in trust for tasks like vibe coding.