Run Qwen3-Coder-480B-A35B Locally with Unsloth Dynamic Quants

9 months ago

Qwen3-Coder-480B-A35B offers state-of-the-art performance in coding tasks, matching or surpassing models like Claude Sonnet-4 and GPT-4.1.
The model supports a 256K token context, extendable to 1M tokens, and achieves a 61.8% score on Aider Polygot.
Unsloth Dynamic 2.0 is used for quantization, enabling minimal accuracy loss while running and fine-tuning Qwen LLMs.
Recommended inference settings include temperature=0.7, top_p=0.8, top_k=20, and repetition_penalty=1.05.
Llama.cpp is recommended for optimized inference, with options for full precision (unquantized) or quantized versions.
The model supports tool calling, demonstrated with a function to fetch current temperatures.
Performance benchmarks highlight strong results in agentic coding, browser use, and tool-use scenarios.

Hasty Briefsbeta