FreshStack: Realistic benchmarks for evaluating retrieval on technical documents
3 days ago
- #Information Retrieval
- #RAG
- #Benchmarking
- FreshStack is a framework for building information retrieval (IR) evaluation benchmarks.
- It automates corpus collection from code and technical documentation.
- Generates nuggets from community-asked questions and answers.
- Uses a fusion of retrieval techniques and hybrid architectures for document retrieval.
- Five datasets were built on fast-growing, niche topics to ensure challenging tasks.
- Existing retrieval models underperform oracle approaches on FreshStack datasets.
- Identifies cases where rerankers do not improve first-stage retrieval accuracy.
- Oracle context helps LLM generators produce high-quality RAG answers.
- Aims to facilitate realistic, scalable, and uncontaminated IR and RAG evaluations.