Are OpenAI and Anthropic Losing Money on Inference?
13 days ago
- #Cloud Computing
- #Inference Costs
- #AI Economics
- AI inference costs are often misunderstood, with input processing being significantly cheaper than output generation.
- Input tokens cost around $0.003 per million, while output tokens cost $3.08 per million, a thousand-fold difference.
- Production setups using H100 GPUs can process 46.8 billion input tokens per hour but only 46.7 million output tokens per hour.
- Long context lengths (128k+ tokens) shift operations from memory-bound to compute-bound, increasing costs by 2-10x.
- Consumer plans like ChatGPT Pro show a 5-6x markup, while developer plans like Claude Code Max show 11-20x markups due to asymmetric token usage.
- API pricing has gross margins of 80-95%, making it highly profitable.
- Use cases like coding assistants benefit from cheap input tokens and minimal output, while video generation is expensive due to high output token requirements.
- The narrative of AI being unsustainably expensive may be overstated, especially for input-heavy workloads.
- Incumbent players may exaggerate costs to discourage competition and maintain high margins.