GitHub - OpenBMB/VoxCPM: VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
6 hours ago
- #Voice Cloning
- #Text-to-Speech
- #Multilingual AI
- VoxCPM2 is a tokenizer-free Text-to-Speech system with 2B parameters, trained on over 2 million hours of multilingual speech data.
- It supports 30 languages, Voice Design, Controllable Voice Cloning, and outputs 48kHz studio-quality audio via an end-to-end diffusion autoregressive architecture.
- Features include real-time streaming with low RTF, fully open-source Apache-2.0 licensing, and fine-tuning options like SFT and LoRA.
- Performance benchmarks show state-of-the-art results in multilingual TTS tasks, with high intelligibility and similarity scores across languages.
- Risks include potential misuse for impersonation, variability in controllable generation, and limited language coverage for non-supported languages.