Both Codex and Claude got worse this week. Across every plan I retested

10 hours ago

The article introduces a metric called 'quality-adjusted tokens per dollar' to compare AI service costs, combining token volume with quality scores from benchmarks like Arena ELO and Artificial Analysis Intelligence Index.
Three main AI cost categories are evaluated: local hardware (amortized one-time cost over 3 years, excluding electricity), subscriptions (with empirically measured token limits), and pay-per-token APIs (adjustable input/output ratios for different use cases).
Recommendations are provided for specific use cases: Claude Sonnet/Opus and models like Qwen3-Coder for coding, GPT-5 Chat/Claude Opus for writing, and Qwen3.5 35B A3B for local GPU usage, with detailed comparisons of ChatGPT and Claude subscription plans.
The methodology for measuring subscription token quotas is explained, involving standardized tasks via CLI tools to extrapolate weekly limits, with results showing multipliers comparing subscription costs to equivalent API pricing.
Caveats include the impact of cache rates on value, assumptions of full quota utilization, and corrections in cache pricing calculations, alongside open JSON data availability for external use under Apache 2.0.

Hasty Briefsbeta