Hasty Briefsbeta

Bilingual

Qwen3: Think deeper, act faster

a year ago
  • #AI
  • #Language Models
  • #Machine Learning
  • Qwen3 is the latest addition to the Qwen family of large language models, featuring competitive performance in coding, math, and general capabilities.
  • Two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models are open-weighted under Apache 2.0 license.
  • Qwen3 introduces hybrid thinking modes: Thinking Mode for complex reasoning and Non-Thinking Mode for quick responses.
  • Supports 119 languages and dialects, enhancing global accessibility.
  • Improved agentic capabilities with optimized coding and tool-calling support.
  • Pre-training expanded to 36 trillion tokens, covering multiple languages and domains.
  • Post-training includes a four-stage pipeline for reasoning and rapid response capabilities.
  • Available on platforms like Hugging Face, ModelScope, and Kaggle, with deployment options like SGLang and vLLM.
  • Advanced usage includes dynamic control of thinking mode via `/think` and `/no_think` tags.
  • Future work focuses on scaling data, model size, context length, and advancing RL for long-horizon reasoning.