Hasty Briefsbeta

Bilingual

How Lume Works: The Retrieval Primitives

2 days ago
  • #rust
  • #search-engine
  • #hybrid-search
  • Lume is a hybrid search engine built in Rust, combining lexical BM25, semantic GTR-T5 vectors, and a significance-scored entity graph.
  • Key principles include local-first operation, layered independent signals for ranking, and full auditability of retrieval steps.
  • Retrieval is based on sections (units of text from Markdown, code, or PDFs), with field-aware BM25 scoring that weights title matches higher.
  • Two-stage pruning reduces candidates via roaring-bitmap union and Gödel signatures before heavy scoring.
  • Query hygiene features include stopword filtering and a coordination factor to prefer documents matching more distinct query terms.
  • Dense semantic vectors via Shivvr (GTR-T5) handle vocabulary gaps, with incremental indexing using content hashes for efficiency.
  • The entity graph uses significance scoring to avoid hub bias, measuring co-occurrence against independence expectations.
  • Hybrid ranking blends lexical, semantic, and graph scores multiplicatively, with tunable knobs (alpha, beta) for each signal.
  • A case study illustrates how query diversification and dynamic retrieval depth fixed an agent's failure to find relevant passages.
  • Lume is open-source (BSD-3 licensed), designed for inspectable retrieval in agentic systems.