Hasty Briefsbeta

Bilingual

Expensively Quadratic: The LLM Agent Cost Curve

12 days ago
  • #coding agents
  • #LLM costs
  • #cache optimization
  • Coding agents incur costs from input tokens, cache writes, output tokens, and cache reads.
  • Cache reads dominate costs as conversations grow longer, reaching 87% of total cost in one example.
  • Median input tokens are around 285, and median output tokens are about 100, but distributions vary widely.
  • Anthropic's pricing structure makes cache reads a significant cost factor after 20,000 tokens.
  • Fewer LLM calls reduce costs but may compromise the agent's accuracy in completing tasks.
  • Some agents limit large tool outputs to avoid multiple costly reads, though this may not be optimal.
  • Subagents and tools that call LLMs can help manage iteration outside the main context window.
  • Restarting conversations can be cheaper than continuing long ones, despite feeling wasteful.
  • Cost management, context management, and agent orchestration may be interconnected challenges.
  • Approaches like Recursive Language Models are being considered to address these issues.