Show HN: Shimmy – 5MB privacy-first, local alternative to Ollama (680MB)
6 days ago
- #Local Inference
- #AI
- #OpenAI API
- Shimmy is a free, lightweight (5.1MB) local inference server with OpenAI API compatibility.
- It offers fast startup (<100ms), low memory overhead (<50MB), and automatic port management.
- Shimmy supports GGUF models with zero configuration and auto-discovers models from Hugging Face cache or local directories.
- Privacy-focused: all code stays on your machine with no per-token pricing.
- Easy integration with tools like VSCode, Cursor, and Continue.dev.
- First-class LoRA adapter support, enabling quick transition from training to production.
- Available via cargo install, npm, and soon Python and Docker.
- MIT licensed forever, with a commitment to never become a paid product.
- Sponsorship options available for those who wish to support development.
- Built with Rust + Tokio for memory-safe, async performance and llama.cpp backend for GGUF inference.