Running local models is good now

11 hours ago

The author finds local AI models have become surprisingly effective for various tasks, moving beyond simple lookup functions to agentic coding.
Key local models mentioned include Mistral 7B, Gemma 3, GPT-OSS-20B, and Qwen variants, run through setups like llama.cpp, Ollama, and LM Studio.
Gemma-4 models, particularly the 26B and 12B-QAT versions, enable local agentic workflows with about 75% the accuracy/speed of frontier models.
A setup using Pi as an agent harness and LM Studio as an inference server is detailed, with Docker for security and configuration tweaks.
Benefits of local models include introspectability (e.g., token inference, context window adjustments) and customization, despite challenges like speed and hardware limits.

Hasty Briefsbeta