Qwen3: Think deeper, act faster

a year ago

Qwen3 is the latest addition to the Qwen family of large language models, featuring competitive performance in coding, math, and general capabilities.
Two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models are open-weighted under Apache 2.0 license.
Qwen3 introduces hybrid thinking modes: Thinking Mode for complex reasoning and Non-Thinking Mode for quick responses.
Supports 119 languages and dialects, enhancing global accessibility.
Improved agentic capabilities with optimized coding and tool-calling support.
Pre-training expanded to 36 trillion tokens, covering multiple languages and domains.
Post-training includes a four-stage pipeline for reasoning and rapid response capabilities.
Available on platforms like Hugging Face, ModelScope, and Kaggle, with deployment options like SGLang and vLLM.
Advanced usage includes dynamic control of thinking mode via `/think` and `/no_think` tags.
Future work focuses on scaling data, model size, context length, and advancing RL for long-horizon reasoning.

Hasty Briefsbeta