Can I Buy Your KV Cache?

5 hours ago

AI agents inefficiently recompute identical KV caches for documents, leading to redundant compute-intensive prefills.
Proposal: Publishers precompute KV cache once, agents purchase and load it to skip prefill, achieving token-exact accuracy.
KV cache reuse is 9-50x cheaper than prefill, with savings increasing with document length.
Shipping KV fails due to incompressibility and high egress costs; hosting provider-side eliminates egress.
Cost example: Serving a hot 3774-token document to 80M agents costs $1.5M with prefill vs $0.03M with reuse.
KV cache reuse enables a prefill CDN with potential provider margins, open challenges include lossless KV compression and payment layers.

Hasty Briefsbeta