Alibaba open-sources Qwen3.6-35B-A3B, a 35B MoE model with 3B active parameters
4 hours ago
- #open-source-ai
- #multimodal-ai
- #large-language-model
- Qwen3.6-35B-A3B is a post-trained model released by the Qwen team, offering improved stability and utility based on community feedback.
- It features 35B total parameters (3B activated), a context length of up to 262,144 tokens (extendable to 1,010,000 with YaRN), and supports multimodal inputs including images and videos.
- Key enhancements include better agentic coding for frontend workflows and repository-level reasoning, along with thinking preservation to retain reasoning context from historical messages.
- The model is compatible with various inference frameworks such as SGLang, vLLM, KTransformers, and Hugging Face Transformers for deployment and serving.
- It achieves strong benchmark results in coding (e.g., SWE-bench, Terminal-Bench), knowledge (e.g., MMLU-Pro, C-Eval), and vision-language tasks (e.g., MMMU, MathVista).
- Usage involves API integration with recommended sampling parameters for different modes (thinking vs. non-thinking) and tasks, plus support for tool calling and agent applications via Qwen-Agent.