Stop Using Ollama

7 hours ago

Ollama started as an easy wrapper for llama.cpp, making local LLMs accessible but later obscured its reliance on the underlying technology.
The project failed to properly credit llama.cpp for over a year, ignoring MIT license requirements by not including the copyright notice.
Ollama forked and replaced llama.cpp with a custom backend that introduced bugs and performed worse, with benchmarks showing significantly slower inference speeds.
Misleading model naming, such as listing distilled versions as the full model (e.g., DeepSeek-R1), caused confusion and reputational damage to model creators.
The introduction of a closed-source desktop app and proprietary Modelfile system created vendor lock-in and added unnecessary complexity compared to single-file GGUF.
Ollama's registry bottleneck delays new model availability and limits quantization options, forcing users to wait or use other tools for community-quantized models.
A pivot to cloud-hosted models raised privacy concerns, with vulnerabilities like CVE-2025-51471 exposing tokens and unclear data handling by third-party providers.
Venture capital incentives drove decisions toward monetization, lock-in, and reduced transparency, straying from the local-first mission.
Alternatives like llama.cpp (with its API server), LM Studio, Jan, and llama-swap offer better performance, openness, and ease of use without Ollama's drawbacks.

Hasty Briefsbeta