Agent Reading Test
4 hours ago
- #web content reading
- #benchmark
- #AI coding agents
- AI coding agents often fail silently when reading web content due to truncation, CSS noise, client-side rendering, and other issues.
- The Agent Reading Test is a benchmark designed to surface these failure modes using specific test pages with embedded canary tokens.
- Test pages include challenges like large page size, inline CSS, client-side rendering, tabbed content, HTTP status codes, markdown parsing, redirects, and section headers.
- Agents perform realistic documentation tasks; after completing tasks, they report which canary tokens they encountered to generate a score.
- The benchmark scores up to 20 points based on found tokens and correct answers to qualitative questions.
- Typical agent scores range from 14-18 out of 20, reflecting current limitations in web fetch pipelines.
- This test complements the Agent-Friendly Documentation Spec, which evaluates documentation sites for AI agent usability.
- The benchmark shifts focus from testing documentation sites to testing the agents themselves.