Running Claude Code Offline on an M3 Pro with Qwen3.6

6 hours ago

Claude Code runs locally on a laptop via Ollama, using a Mixture of Experts model (Qwen3.6:35b-a3b-coding-nvfp4) that requires specific fixes.
Four fixes are crucial: disable thinking via MAX_THINKING_TOKENS=0, use Ollama version 0.24.0, MLX runner ignores Modelfile templates so use API parameters, ignore HTTP 404 errors in logs.
Performance is hardware-dependent; prefill speed is memory bandwidth-bound, and context window size is limited by available memory (36 GiB gives 32K tokens, 64 GiB+ supports 256K).
Local setup ensures data privacy in regulated environments, trading cloud speed for flat cost and no data leaving the machine.
The setup works on Apple Silicon Macs, with recommended memory of 48 GiB or more for comfortable operation, and is suitable for air-gapped or sensitive data environments.

Hasty Briefsbeta