I traced 3,177 API calls to see what 4 AI coding tools put in the context window

6 days ago

The author built Context Lens to analyze how different AI coding tools use tokens in their context windows.
Four tools (Claude Opus, Claude Sonnet, Codex, Gemini) were tested with the same bug-fixing task in an Express.js repository.
All tools successfully fixed the bug but used vastly different token counts: Opus (23K-35K), Sonnet (43K-70K), Codex (29K-47K), Gemini (179K-350K).
Opus was the most efficient, using git history to pinpoint the bug with minimal code reading but carried a heavy 'tool definition' overhead (69% of context).
Sonnet took a thorough approach, reading test files and source code, resulting in more balanced context usage but higher token counts.
Codex used Unix-like commands (grep, sed) for targeted code reading, making it predictable and efficient with low tool definition overhead (6%).
Gemini had no tool definition overhead but aggressively consumed context by dumping entire files and git histories (96% tool results), with highly variable token usage.
None of the tools actively managed their context budget; efficiency differences came from investigation strategies rather than deliberate optimization.
Context Lens is open-source and provides real-time analysis of LLM API calls, helping developers understand token usage.

Hasty Briefsbeta