Agent Reading Test

4 hours ago

AI coding agents often fail silently when reading web content due to truncation, CSS noise, client-side rendering, and other issues.
The Agent Reading Test is a benchmark designed to surface these failure modes using specific test pages with embedded canary tokens.
Test pages include challenges like large page size, inline CSS, client-side rendering, tabbed content, HTTP status codes, markdown parsing, redirects, and section headers.
Agents perform realistic documentation tasks; after completing tasks, they report which canary tokens they encountered to generate a score.
The benchmark scores up to 20 points based on found tokens and correct answers to qualitative questions.
Typical agent scores range from 14-18 out of 20, reflecting current limitations in web fetch pipelines.
This test complements the Agent-Friendly Documentation Spec, which evaluates documentation sites for AI agent usability.
The benchmark shifts focus from testing documentation sites to testing the agents themselves.

Hasty Briefsbeta