Hasty Briefsbeta

Bilingual

Exploiting Local KV Cache Asymmetry for Long-Context LLMs

10 months ago
  • #LLMs
  • #KV Cache
  • #Compression
  • KV cache compression is crucial for efficient long-context modeling in LLMs.
  • A key-value asymmetry exists: keys show local homogeneity, while values are heterogeneous.
  • Existing compression methods fail to address this asymmetry, treating keys and values uniformly.
  • Proposed AsymKV framework combines key merging and lossless value compression.
  • AsymKV outperforms SOTA methods, e.g., achieving 43.95 on LongBench vs. H$_2$O's 38.89.