Qwen3: Think deeper, act faster
a year ago
- #AI
- #Language Models
- #Machine Learning
- Qwen3 is the latest addition to the Qwen family of large language models, featuring competitive performance in coding, math, and general capabilities.
- Two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models are open-weighted under Apache 2.0 license.
- Qwen3 introduces hybrid thinking modes: Thinking Mode for complex reasoning and Non-Thinking Mode for quick responses.
- Supports 119 languages and dialects, enhancing global accessibility.
- Improved agentic capabilities with optimized coding and tool-calling support.
- Pre-training expanded to 36 trillion tokens, covering multiple languages and domains.
- Post-training includes a four-stage pipeline for reasoning and rapid response capabilities.
- Available on platforms like Hugging Face, ModelScope, and Kaggle, with deployment options like SGLang and vLLM.
- Advanced usage includes dynamic control of thinking mode via `/think` and `/no_think` tags.
- Future work focuses on scaling data, model size, context length, and advancing RL for long-horizon reasoning.