Hasty Briefsbeta

FreshStack: Realistic benchmarks for evaluating retrieval on technical documents

3 days ago
  • #Information Retrieval
  • #RAG
  • #Benchmarking
  • FreshStack is a framework for building information retrieval (IR) evaluation benchmarks.
  • It automates corpus collection from code and technical documentation.
  • Generates nuggets from community-asked questions and answers.
  • Uses a fusion of retrieval techniques and hybrid architectures for document retrieval.
  • Five datasets were built on fast-growing, niche topics to ensure challenging tasks.
  • Existing retrieval models underperform oracle approaches on FreshStack datasets.
  • Identifies cases where rerankers do not improve first-stage retrieval accuracy.
  • Oracle context helps LLM generators produce high-quality RAG answers.
  • Aims to facilitate realistic, scalable, and uncontaminated IR and RAG evaluations.