Hasty Briefsbeta

Bilingual

Semantic Search in Under 3MB

10 hours ago
  • #quantization
  • #model-optimization
  • #semantic-search
  • Project optimized a semantic search reranking model from 11.4 MB to 2.79 MB gzipped, focusing on size reduction and performance enhancement for resume page application.
  • Utilized term dropout to mitigate overfitting and prevent keyword matching, improving model robustness in a small corpus.
  • Mined queries from job postings using an LLM to create realistic training data, boosting MRR by 21% initially.
  • Conducted architecture experiments: max pooling outperformed mean pooling, factorized embeddings saved parameters, while SwiGLU showed no gain; multi-vector late interaction improved token-level expressiveness.
  • Reduced vocabulary from 30k to 5k tokens, decreased embedding dimensions, and applied aggressive quantization, including 1.58-bit ternary quantization for weights, cutting file size from 8.3 MB to 3.9 MB.
  • Replaced ONNX Runtime Web with a custom WASM binary in Rust, slashing inference logic size from 3.4 MB to 4 kB.
  • Results showed the final model outperformed baseline and BM25, achieving nDCG@10 scores of 0.787 overall and 0.694 on a hard subset.
  • Unsuccessful attempts included factorization post-training, attention pooling, SwiGLU, and ternary cross-encoder, with diminishing returns on extra training data.