Agents.md file isn't the problem. Your lack of Evals is
7 hours ago
- #Context Engineering
- #AI Agents
- #Evals
- Context files (like AGENTS.md) improve task completion by only 4%, while LLM-generated ones reduce performance by 3% and increase costs by 20%.
- The real issue is not the context files themselves but the lack of evaluation (evals) to validate their effectiveness.
- Context files should be treated like tests: lean, validated, and focused on high-signal instructions that correct specific agent behaviors.
- Evals provide a feedback loop to measure whether context instructions improve agent performance, helping to refine and optimize context files.
- Auto-generated context files perform poorly because they lack validation and expertise about what the agent actually gets wrong.
- The solution is not to abandon context files but to adopt a disciplined approach to context engineering, similar to testing practices.