Qwen 3.5 small models out
6 hours ago
- #AI
- #LanguageModel
- #Multimodal
- Qwen3.5-35B-A3B is a post-trained model available in Hugging Face Transformers format, compatible with various inference frameworks like vLLM, SGLang, and KTransformers.
- Alibaba Cloud Model Studio provides a managed API service for Qwen3.5, featuring Qwen3.5-Flash with extended context length and built-in tools.
- Qwen3.5 introduces significant advancements in multimodal learning, architectural efficiency, reinforcement learning scalability, and global linguistic coverage (201 languages).
- Key enhancements include Unified Vision-Language Foundation, Efficient Hybrid Architecture, Scalable RL Generalization, and Next-Generation Training Infrastructure.
- Model specifications: 35B total parameters (3B activated), 40 layers, 262,144 native context length (extendable to 1M tokens), and Mixture of Experts architecture.
- Benchmark results show competitive performance across knowledge, instruction following, long context, STEM & reasoning, coding, and multilingual tasks.
- Vision-language benchmarks demonstrate strong capabilities in STEM, general VQA, text recognition, spatial intelligence, video understanding, and medical VQA.
- Quickstart guides for serving Qwen3.5 via SGLang, vLLM, KTransformers, and Hugging Face Transformers, with recommendations for optimal performance.
- API usage examples for text, image, and video inputs, including sampling parameters for different task types (general, coding, reasoning).
- Agentic usage with Qwen-Agent and Qwen Code for terminal automation, tool calling capabilities, and processing ultra-long texts with YaRN scaling.
- Best practices include recommended sampling parameters, adequate output length (32K-81K tokens), and standardized output formats for benchmarking.
- Citation available for referencing Qwen3.5's work in native multimodal agents.