How to Setup a Local Coding Agent on macOS

6 hours ago

Setup a local coding agent on macOS using llama.cpp with Metal acceleration, Gemma 4 26B-A4B and Qwen3.6 35B-A3B models, and Multi-Token Prediction (MTP) speculative decoding for speed improvements.
Include multimodal support via a projector (mmproj-BF16.gguf) to handle images/screenshots, integrated with Pi as the coding agent through an OpenAI-compatible API.
Performance benchmarks show Gemma 4 with MTP achieves 72.2 tokens/second generation speed on an M1 Max Mac, outperforming MLX models, while Qwen3.6 offers better coding capabilities but at 55 tokens/second.
Detailed setup steps: Install llama.cpp, download model files from Hugging Face, start a local server with tmux wrapper, and configure Pi to use the local model with text and image input.

Hasty Briefsbeta