Can I Buy Your KV Cache?
5 hours ago
- #KV Cache Optimization
- #AI Efficiency
- #Prefill Computation
- AI agents inefficiently recompute identical KV caches for documents, leading to redundant compute-intensive prefills.
- Proposal: Publishers precompute KV cache once, agents purchase and load it to skip prefill, achieving token-exact accuracy.
- KV cache reuse is 9-50x cheaper than prefill, with savings increasing with document length.
- Shipping KV fails due to incompressibility and high egress costs; hosting provider-side eliminates egress.
- Cost example: Serving a hot 3774-token document to 80M agents costs $1.5M with prefill vs $0.03M with reuse.
- KV cache reuse enables a prefill CDN with potential provider margins, open challenges include lossless KV compression and payment layers.