Show HN: Shimmy – 5MB privacy-first, local alternative to Ollama (680MB)

6 days ago

Copy Link

Shimmy is a free, lightweight (5.1MB) local inference server with OpenAI API compatibility.
It offers fast startup (<100ms), low memory overhead (<50MB), and automatic port management.
Shimmy supports GGUF models with zero configuration and auto-discovers models from Hugging Face cache or local directories.
Privacy-focused: all code stays on your machine with no per-token pricing.
Easy integration with tools like VSCode, Cursor, and Continue.dev.
First-class LoRA adapter support, enabling quick transition from training to production.
Available via cargo install, npm, and soon Python and Docker.
MIT licensed forever, with a commitment to never become a paid product.
Sponsorship options available for those who wish to support development.
Built with Rust + Tokio for memory-safe, async performance and llama.cpp backend for GGUF inference.

Hasty Briefsbeta