Hasty Briefsbeta

Refrag: Rethinking RAG Based Decoding

a day ago
  • #LLM
  • #RAG
  • #Efficiency
  • REFRAG is proposed as an efficient decoding framework for RAG applications.
  • It addresses the trade-off between knowledge enrichment and system efficiency in LLMs.
  • REFRAG compresses, senses, and expands to improve latency, achieving a 30.85% acceleration in time-to-first-token.
  • The framework extends the context size of LLMs by 16× without loss in perplexity.
  • Validation across diverse long-context tasks shows substantial speedup with no accuracy loss compared to LLaMA models.