Run Qwen3-Coder-480B-A35B Locally with Unsloth Dynamic Quants
9 months ago
- #AI
- #LLM
- #Coding
- Qwen3-Coder-480B-A35B offers state-of-the-art performance in coding tasks, matching or surpassing models like Claude Sonnet-4 and GPT-4.1.
- The model supports a 256K token context, extendable to 1M tokens, and achieves a 61.8% score on Aider Polygot.
- Unsloth Dynamic 2.0 is used for quantization, enabling minimal accuracy loss while running and fine-tuning Qwen LLMs.
- Recommended inference settings include temperature=0.7, top_p=0.8, top_k=20, and repetition_penalty=1.05.
- Llama.cpp is recommended for optimized inference, with options for full precision (unquantized) or quantized versions.
- The model supports tool calling, demonstrated with a function to fetch current temperatures.
- Performance benchmarks highlight strong results in agentic coding, browser use, and tool-use scenarios.