Alibaba open-sources Qwen3.6-35B-A3B, a 35B MoE model with 3B active parameters

6 hours ago

Qwen3.6-35B-A3B is a post-trained model released by the Qwen team, offering improved stability and utility based on community feedback.
It features 35B total parameters (3B activated), a context length of up to 262,144 tokens (extendable to 1,010,000 with YaRN), and supports multimodal inputs including images and videos.
Key enhancements include better agentic coding for frontend workflows and repository-level reasoning, along with thinking preservation to retain reasoning context from historical messages.
The model is compatible with various inference frameworks such as SGLang, vLLM, KTransformers, and Hugging Face Transformers for deployment and serving.
It achieves strong benchmark results in coding (e.g., SWE-bench, Terminal-Bench), knowledge (e.g., MMLU-Pro, C-Eval), and vision-language tasks (e.g., MMMU, MathVista).
Usage involves API integration with recommended sampling parameters for different modes (thinking vs. non-thinking) and tasks, plus support for tool calling and agent applications via Qwen-Agent.

Hasty Briefsbeta