Expensively Quadratic: The LLM Agent Cost Curve

12 days ago

Coding agents incur costs from input tokens, cache writes, output tokens, and cache reads.
Cache reads dominate costs as conversations grow longer, reaching 87% of total cost in one example.
Median input tokens are around 285, and median output tokens are about 100, but distributions vary widely.
Anthropic's pricing structure makes cache reads a significant cost factor after 20,000 tokens.
Fewer LLM calls reduce costs but may compromise the agent's accuracy in completing tasks.
Some agents limit large tool outputs to avoid multiple costly reads, though this may not be optimal.
Subagents and tools that call LLMs can help manage iteration outside the main context window.
Restarting conversations can be cheaper than continuing long ones, despite feeling wasteful.
Cost management, context management, and agent orchestration may be interconnected challenges.
Approaches like Recursive Language Models are being considered to address these issues.

Hasty Briefsbeta