Hasty Briefsbeta

Bilingual

Agent Reading Test

4 hours ago
  • #web content reading
  • #benchmark
  • #AI coding agents
  • AI coding agents often fail silently when reading web content due to truncation, CSS noise, client-side rendering, and other issues.
  • The Agent Reading Test is a benchmark designed to surface these failure modes using specific test pages with embedded canary tokens.
  • Test pages include challenges like large page size, inline CSS, client-side rendering, tabbed content, HTTP status codes, markdown parsing, redirects, and section headers.
  • Agents perform realistic documentation tasks; after completing tasks, they report which canary tokens they encountered to generate a score.
  • The benchmark scores up to 20 points based on found tokens and correct answers to qualitative questions.
  • Typical agent scores range from 14-18 out of 20, reflecting current limitations in web fetch pipelines.
  • This test complements the Agent-Friendly Documentation Spec, which evaluates documentation sites for AI agent usability.
  • The benchmark shifts focus from testing documentation sites to testing the agents themselves.