A high-performance document search engine built in Rust with WebAssembly support
3 days ago
- #Rust
- #SearchEngine
- #WebAssembly
- High-performance document search engine built in Rust with WebAssembly support.
- Combines full-text search using FST (Finite State Transducers) with FSST compression for efficient storage and fast fuzzy matching.
- Interactive demo available, showcasing search through 50,000 news articles from the AG News dataset.
- Performance metrics include index size of 11.48 MB (WASM), compressed size of 5.20 MB (Brotli), and search speed of ~1-3ms per query.
- Features include fast fuzzy search, compact storage with FSST compression, RAKE keyword extraction, and WebAssembly readiness.
- Available as a standalone CLI tool for building .wasm files from document collections without requiring Rust tooling.
- Installation instructions provided for macOS/Linux and Windows.
- Supports multiple platforms including macOS (Intel/Apple Silicon), Linux (x64/ARM64), and Windows (x64/ARM64).
- Building from source requires Rust, wasm-pack, and Node.js.
- Document preparation involves creating a JSON file with document details.
- Indexing phase includes keyword extraction, relevance scoring, FST mapping, and FSST compression.
- Embedding phase involves parsing WASM module, expanding memory, and adding index as a new data segment.
- Search phase includes fuzzy matching, score accumulation, and decompression of document strings.
- Leverages libraries like fst, fsst-rs, rake, serde/postcard, wasm-bindgen, and wasm-encoder/wasmparser.
- Provides sub-millisecond search times, 60-80% compression ratio, and instant startup with lazy index loading.
- Inspired by technologies like Algolia, TypeSense, Lunr.js, Stork Search, and Tinysearch.
- Key concepts include Finite State Transducers, RAKE Algorithm, and FSST Compression.