Qwen 3.6 27B is the sweet spot for local development

4 hours ago

The author was previously disappointed by local models but was impressed by Qwen 3.6, calling it the first local model that makes sense as a general intelligence.
Qwen 3.6 comes in two variants: a mixture-of-experts Qwen 3.6 35B A3B (faster) and a dense Qwen 3.6 27B (slower but more powerful), with the author recommending the 27B version.
The model performed well in tests, including creative writing tasks like poems about Zouk dance and quantum physics, and coding tasks such as creating a hexagonal minesweeper with pnpm.
Running Qwen 3.6 locally is straightforward using llama.cpp, an open-source tool that supports quantization (e.g., 8-bit for reduced size) and can be configured for GPU use with features like flash attention.
Performance benchmarks show Qwen 3.6 27B is competitive, with speeds around 30 tokens per second on a Macbook Max M5, and it outperforms other local models like Gemma 4 31B in coding tasks.
The author highlights the advantages of local models, including privacy (for sensitive data like medical information), customization through fine-tuning, and independence from proprietary model subsidies or takedowns.
Future prospects include even smarter local models that may separate intelligence from factual knowledge using tool calling, making them feasible for devices like smartphones.

Hasty Briefsbeta