Hasty Briefsbeta

Bilingual

Running Claude Code Offline on an M3 Pro with Qwen3.6

6 hours ago
  • #Hardware Performance
  • #Air-Gapped Setup
  • #Local AI
  • Claude Code runs locally on a laptop via Ollama, using a Mixture of Experts model (Qwen3.6:35b-a3b-coding-nvfp4) that requires specific fixes.
  • Four fixes are crucial: disable thinking via MAX_THINKING_TOKENS=0, use Ollama version 0.24.0, MLX runner ignores Modelfile templates so use API parameters, ignore HTTP 404 errors in logs.
  • Performance is hardware-dependent; prefill speed is memory bandwidth-bound, and context window size is limited by available memory (36 GiB gives 32K tokens, 64 GiB+ supports 256K).
  • Local setup ensures data privacy in regulated environments, trading cloud speed for flat cost and no data leaving the machine.
  • The setup works on Apple Silicon Macs, with recommended memory of 48 GiB or more for comfortable operation, and is suitable for air-gapped or sensitive data environments.