How Lume Works: The Retrieval Primitives
2 days ago
- #rust
- #search-engine
- #hybrid-search
- Lume is a hybrid search engine built in Rust, combining lexical BM25, semantic GTR-T5 vectors, and a significance-scored entity graph.
- Key principles include local-first operation, layered independent signals for ranking, and full auditability of retrieval steps.
- Retrieval is based on sections (units of text from Markdown, code, or PDFs), with field-aware BM25 scoring that weights title matches higher.
- Two-stage pruning reduces candidates via roaring-bitmap union and Gödel signatures before heavy scoring.
- Query hygiene features include stopword filtering and a coordination factor to prefer documents matching more distinct query terms.
- Dense semantic vectors via Shivvr (GTR-T5) handle vocabulary gaps, with incremental indexing using content hashes for efficiency.
- The entity graph uses significance scoring to avoid hub bias, measuring co-occurrence against independence expectations.
- Hybrid ranking blends lexical, semantic, and graph scores multiplicatively, with tunable knobs (alpha, beta) for each signal.
- A case study illustrates how query diversification and dynamic retrieval depth fixed an agent's failure to find relevant passages.
- Lume is open-source (BSD-3 licensed), designed for inspectable retrieval in agentic systems.