Hasty Briefsbeta

Bilingual

Qwen 3.6 27B is the sweet spot for local development

4 hours ago
  • #Qwen 3.6
  • #Local AI Models
  • #llama.cpp
  • The author was previously disappointed by local models but was impressed by Qwen 3.6, calling it the first local model that makes sense as a general intelligence.
  • Qwen 3.6 comes in two variants: a mixture-of-experts Qwen 3.6 35B A3B (faster) and a dense Qwen 3.6 27B (slower but more powerful), with the author recommending the 27B version.
  • The model performed well in tests, including creative writing tasks like poems about Zouk dance and quantum physics, and coding tasks such as creating a hexagonal minesweeper with pnpm.
  • Running Qwen 3.6 locally is straightforward using llama.cpp, an open-source tool that supports quantization (e.g., 8-bit for reduced size) and can be configured for GPU use with features like flash attention.
  • Performance benchmarks show Qwen 3.6 27B is competitive, with speeds around 30 tokens per second on a Macbook Max M5, and it outperforms other local models like Gemma 4 31B in coding tasks.
  • The author highlights the advantages of local models, including privacy (for sensitive data like medical information), customization through fine-tuning, and independence from proprietary model subsidies or takedowns.
  • Future prospects include even smarter local models that may separate intelligence from factual knowledge using tool calling, making them feasible for devices like smartphones.