Hasty Briefsbeta

Bilingual

Qwen3.5-Omni Technical Report

17 hours ago
  • #Large Language Models
  • #Multimodal AI
  • #Speech Synthesis
  • Qwen3.5-Omni is an advanced multimodal model scaling to hundreds of billions of parameters with a 256k context length.
  • It achieves SOTA on 215 audio and audio-visual tasks, surpassing or matching competitors like Gemini-3.1 Pro.
  • The model uses a Hybrid Attention MoE framework for efficient long-sequence inference and supports extensive audio and video processing.
  • ARIA is introduced to enhance streaming speech synthesis stability by dynamically aligning text and speech units.
  • It supports multilingual understanding and speech generation across 10 languages with emotional nuance and advanced audio-visual grounding capabilities.
  • The model exhibits a novel Audio-Visual Vibe Coding capability, enabling coding directly from audio-visual instructions.