Refrag: Rethinking RAG Based Decoding
a day ago
- #LLM
- #RAG
- #Efficiency
- REFRAG is proposed as an efficient decoding framework for RAG applications.
- It addresses the trade-off between knowledge enrichment and system efficiency in LLMs.
- REFRAG compresses, senses, and expands to improve latency, achieving a 30.85% acceleration in time-to-first-token.
- The framework extends the context size of LLMs by 16× without loss in perplexity.
- Validation across diverse long-context tasks shows substantial speedup with no accuracy loss compared to LLaMA models.