Hasty Briefsbeta

Show HN: Shimmy – 5MB privacy-first, local alternative to Ollama (680MB)

6 days ago
  • #Local Inference
  • #AI
  • #OpenAI API
  • Shimmy is a free, lightweight (5.1MB) local inference server with OpenAI API compatibility.
  • It offers fast startup (<100ms), low memory overhead (<50MB), and automatic port management.
  • Shimmy supports GGUF models with zero configuration and auto-discovers models from Hugging Face cache or local directories.
  • Privacy-focused: all code stays on your machine with no per-token pricing.
  • Easy integration with tools like VSCode, Cursor, and Continue.dev.
  • First-class LoRA adapter support, enabling quick transition from training to production.
  • Available via cargo install, npm, and soon Python and Docker.
  • MIT licensed forever, with a commitment to never become a paid product.
  • Sponsorship options available for those who wish to support development.
  • Built with Rust + Tokio for memory-safe, async performance and llama.cpp backend for GGUF inference.