Qwen3-VL-30B-A3B-Instruct and Thinking

18 hours ago

Copy Link

Qwen3-VL is the most powerful vision-language model in the Qwen series.
Comprehensive upgrades include superior text understanding, deeper visual perception, extended context length, and enhanced spatial and video dynamics comprehension.
Available in Dense and MoE architectures with Instruct and reasoning-enhanced Thinking editions.
Key enhancements: Visual Agent, Visual Coding Boost, Advanced Spatial Perception, Long Context & Video Understanding, Enhanced Multimodal Reasoning, Upgraded Visual Recognition, Expanded OCR.
Model architecture updates: Interleaved-MRoPE, DeepStack, Text–Timestamp Alignment.
Performance highlights include multimodal capabilities and quickstart examples for usage.
Citations provided for Qwen3-VL and related works.

Hasty Briefsbeta