Expensively Quadratic: The LLM Agent Cost Curve
12 days ago
- #coding agents
- #LLM costs
- #cache optimization
- Coding agents incur costs from input tokens, cache writes, output tokens, and cache reads.
- Cache reads dominate costs as conversations grow longer, reaching 87% of total cost in one example.
- Median input tokens are around 285, and median output tokens are about 100, but distributions vary widely.
- Anthropic's pricing structure makes cache reads a significant cost factor after 20,000 tokens.
- Fewer LLM calls reduce costs but may compromise the agent's accuracy in completing tasks.
- Some agents limit large tool outputs to avoid multiple costly reads, though this may not be optimal.
- Subagents and tools that call LLMs can help manage iteration outside the main context window.
- Restarting conversations can be cheaper than continuing long ones, despite feeling wasteful.
- Cost management, context management, and agent orchestration may be interconnected challenges.
- Approaches like Recursive Language Models are being considered to address these issues.