Hasty Briefsbeta

Bilingual

Show HN: Intercepting proxy for semantic search over visited pages

9 months ago
  • #proxy
  • #embedding
  • #llm
  • A proxy that embeds every visited web page for similarity searches.
  • HTTP GET 200 responses are re-fetched from pure.md for clean Markdown.
  • The cleaned text is embedded using llm.
  • A minimal Flask UI provides search and cached-page views.
  • This is a plugin for llm, not a stand-alone program.
  • Install llm with pipx: `pipx install llm`.
  • Install the plugin: `llm install git+https://github.com/mlang/llm-embed-proxy`.
  • Optional: Install llm-sentence-transformers for local embedding models.
  • Register a model: `llm sentence-transformers register Qwen/Qwen3-Embedding-0.6B`.
  • Run the proxy: `llm embed-proxy --model sentence-transformers/Qwen/Qwen3-Embedding-0.6B`.
  • Point your browser/system proxy to localhost:8080 and visit http://localhost:8080/ to search.
  • Uses mitmproxy under the hood; generates a CA certificate in ~/.mitmproxy/.
  • Add the mitmproxy CA certificate to your system to avoid warnings.