Exploiting Local KV Cache Asymmetry for Long-Context LLMs
10 months ago
- #LLMs
- #KV Cache
- #Compression
- KV cache compression is crucial for efficient long-context modeling in LLMs.
- A key-value asymmetry exists: keys show local homogeneity, while values are heterogeneous.
- Existing compression methods fail to address this asymmetry, treating keys and values uniformly.
- Proposed AsymKV framework combines key merging and lossless value compression.
- AsymKV outperforms SOTA methods, e.g., achieving 43.95 on LongBench vs. H$_2$O's 38.89.