ByteDance Releases MegaTTS3
a year ago
- #VoiceCloning
- #AI
- #TTS
- Lightweight and efficient TTS Diffusion Transformer with only 0.45B parameters.
- Supports ultra high-quality voice cloning for both Chinese and English, including code-switching.
- Offers controllable features like accent intensity and fine-grained pronunciation adjustments.
- Project released on 2025-03-22 with detailed setup instructions for Linux, Windows, and Docker.
- Pretrained checkpoints available on Google Drive and Huggingface; WaveVAE encoder parameters not included for security.
- Command-line and Web UI usage examples provided for standard and accented TTS.
- Includes additional submodules for speech-text alignment, graphme-to-phoneme conversion, and waveform VAE.
- Security vulnerabilities should be reported via Bytedance Security; project licensed under Apache-2.0.
- Based on research papers: 'Sparse Alignment Enhanced Latent Diffusion Transformer' and 'Wavtokenizer'.