Anyone Can Clone Your Voice Now

3 months ago

Qwen3-TTS supports 10 major languages and multiple dialects with adaptive tone, speaking rate, and emotional expression control.
Key features include powerful speech representation, universal end-to-end architecture, low-latency streaming generation, and intelligent text understanding.
Released models include VoiceDesign, CustomVoice, and Base models, each with specific functionalities like voice cloning and style control.
Models can be downloaded via ModelScope or Hugging Face, with detailed instructions provided for manual downloads.
Quickstart guide includes environment setup, Python package installation, and usage examples for different model functionalities.
Detailed usage examples for Custom Voice, Voice Design, and Voice Clone functionalities are provided with code snippets.
Evaluation benchmarks show Qwen3-TTS performance in content consistency, speaker similarity, and multilingual speech generation.
Speech tokenizer benchmarks compare Qwen3-TTS with other models in terms of ASR tasks and semantic-related speech tokenization.
Citation information is provided for referencing the Qwen3-TTS technical report.

Hasty Briefsbeta