Semantic Search in Under 3MB

10 hours ago

Project optimized a semantic search reranking model from 11.4 MB to 2.79 MB gzipped, focusing on size reduction and performance enhancement for resume page application.
Utilized term dropout to mitigate overfitting and prevent keyword matching, improving model robustness in a small corpus.
Mined queries from job postings using an LLM to create realistic training data, boosting MRR by 21% initially.
Conducted architecture experiments: max pooling outperformed mean pooling, factorized embeddings saved parameters, while SwiGLU showed no gain; multi-vector late interaction improved token-level expressiveness.
Reduced vocabulary from 30k to 5k tokens, decreased embedding dimensions, and applied aggressive quantization, including 1.58-bit ternary quantization for weights, cutting file size from 8.3 MB to 3.9 MB.
Replaced ONNX Runtime Web with a custom WASM binary in Rust, slashing inference logic size from 3.4 MB to 4 kB.
Results showed the final model outperformed baseline and BM25, achieving nDCG@10 scores of 0.787 overall and 0.694 on a hard subset.
Unsuccessful attempts included factorization post-training, attention pooling, SwiGLU, and ternary cross-encoder, with diminishing returns on extra training data.

Hasty Briefsbeta