Agents.md file isn't the problem. Your lack of Evals is

3 months ago

Context files (like AGENTS.md) improve task completion by only 4%, while LLM-generated ones reduce performance by 3% and increase costs by 20%.
The real issue is not the context files themselves but the lack of evaluation (evals) to validate their effectiveness.
Context files should be treated like tests: lean, validated, and focused on high-signal instructions that correct specific agent behaviors.
Evals provide a feedback loop to measure whether context instructions improve agent performance, helping to refine and optimize context files.
Auto-generated context files perform poorly because they lack validation and expertise about what the agent actually gets wrong.
The solution is not to abandon context files but to adopt a disciplined approach to context engineering, similar to testing practices.

Hasty Briefsbeta