Tokenmaxxing is dead, and the real AI cost reckoning hasn't started yet

7 hours ago

OpenAI's financial leak reveals massive costs: $34B total costs vs $13.07B revenue, with a $20.9B operating loss and $8B cash burn.
Inference costs for serving model outputs doubled from $3.8B in 2024 to $8.65B in early 2025, raising questions about sustainability.
Major AI companies like OpenAI, Google, Anthropic, and Meta are pricing inference below cost to capture market share, creating a false price floor.
Enterprises like Uber and Salesforce are experiencing budget overruns, with token usage outpacing value delivery, prompting calls for smarter routing.
Research shows the 'price reversal phenomenon': cheaper-listed models can cost more in practice due to inefficient token use or failure rates.
Agentic workflows, which burn high tokens, succeed in only 41% of tasks, meaning most spend goes toward failed or hallucinated results.
AI costs have shifted from training to inference and agent networks, with token price declines masking true expenses.
Companies are advised to diversify AI model usage and adopt routing infrastructure to optimize costs and reduce reliance on single providers.
Infrastructure startups like Featherless, Fireworks AI, Together AI, and routers like OpenRouter are enabling cost-effective, flexible AI deployments.
The future of AI efficiency lies in strategic model selection, reserving premium models for critical tasks and using open-source for high-volume, low-complexity calls.

Hasty Briefsbeta