What happens when coding agents stop feeling like dialup?
a day ago
- #LLM-Infrastructure
- #AI-Coding-Agents
- #Developer-Productivity
- Coding agents like Claude Code are becoming slower and less reliable, reminiscent of dial-up internet in the late 90s.
- Anthropic and other AI companies face reliability issues, with OpenRouter data showing a 50x increase in AI token usage despite its limited sample size.
- Agentic coding workflows consume significantly more tokens than non-agentic chats, straining infrastructure similar to early broadband struggles.
- Current frontier models operate at 30-60 tokens per second (tok/s), which can be frustratingly slow for supervised coding tasks.
- Faster models like Cerebras Code (2000 tok/s) shift the bottleneck to the user, making it tempting to accept outputs too quickly, leading to poor results.
- The evolution of LLMs for software engineering has progressed from GPT-3.5's hallucinated answers to GPT-4/Sonnet 3.5's reliable snippets, and now to supervised CLI agents.
- The next phase may involve unsupervised agents running multiple parallel attempts at tasks, enabled by higher tok/s speeds, though slower models disrupt workflow efficiency.
- AI demand is in an infinite loop—improvements lead to more resource-intensive usage, unlike the plateau in broadband demand seen in the early 2000s.
- Semiconductor process stagnation limits efficiency gains, capping supply growth and potentially leading to less favorable pricing models for developers.
- Peak-time infrastructure strain may result in off-peak pricing plans to balance demand, though current batch processing options aren't ideal for interactive workflows.
- Developers must stay updated on AI advancements to harness productivity gains, as the field is far from stable, with experienced developers often underestimating its potential.