Qwen3.5
9 days ago
- #AI
- #Language Model
- #Multimodal
- Qwen3.5-397B-A17B is a post-trained model available in Hugging Face Transformers format, compatible with frameworks like vLLM and SGLang.
- Alibaba Cloud Model Studio offers a managed API service for Qwen3.5, with Qwen3.5-Plus providing extended features like 1M context length and built-in tools.
- Qwen3.5 introduces advancements in multimodal learning, architectural efficiency, reinforcement learning scalability, and global language support (201 languages).
- Key enhancements include Unified Vision-Language Foundation, Efficient Hybrid Architecture, Scalable RL Generalization, and Next-Generation Training Infrastructure.
- Model specifications: 397B total parameters, 17B activated, 60 layers, 262,144 native context length (extendable to 1M tokens).
- Benchmark results show competitive performance across knowledge, reasoning, STEM, multilingualism, and vision-language tasks.
- Quickstart guides for API usage, serving with SGLang/vLLM, and integration via OpenAI-compatible APIs are provided.
- Agentic capabilities are highlighted, with Qwen-Agent and Qwen Code recommended for building applications.
- Supports ultra-long text processing via YaRN scaling for contexts up to 1M tokens.
- Best practices include optimized sampling parameters, adequate output length, and standardized output formats for benchmarking.