Are OpenAI and Anthropic Losing Money on Inference?

13 days ago

Copy Link

AI inference costs are often misunderstood, with input processing being significantly cheaper than output generation.
Input tokens cost around $0.003 per million, while output tokens cost $3.08 per million, a thousand-fold difference.
Production setups using H100 GPUs can process 46.8 billion input tokens per hour but only 46.7 million output tokens per hour.
Long context lengths (128k+ tokens) shift operations from memory-bound to compute-bound, increasing costs by 2-10x.
Consumer plans like ChatGPT Pro show a 5-6x markup, while developer plans like Claude Code Max show 11-20x markups due to asymmetric token usage.
API pricing has gross margins of 80-95%, making it highly profitable.
Use cases like coding assistants benefit from cheap input tokens and minimal output, while video generation is expensive due to high output token requirements.
The narrative of AI being unsustainably expensive may be overstated, especially for input-heavy workloads.
Incumbent players may exaggerate costs to discourage competition and maintain high margins.

Hasty Briefsbeta