ByteDance Releases MegaTTS3

a year ago

Lightweight and efficient TTS Diffusion Transformer with only 0.45B parameters.
Supports ultra high-quality voice cloning for both Chinese and English, including code-switching.
Offers controllable features like accent intensity and fine-grained pronunciation adjustments.
Project released on 2025-03-22 with detailed setup instructions for Linux, Windows, and Docker.
Pretrained checkpoints available on Google Drive and Huggingface; WaveVAE encoder parameters not included for security.
Command-line and Web UI usage examples provided for standard and accented TTS.
Includes additional submodules for speech-text alignment, graphme-to-phoneme conversion, and waveform VAE.
Security vulnerabilities should be reported via Bytedance Security; project licensed under Apache-2.0.
Based on research papers: 'Sparse Alignment Enhanced Latent Diffusion Transformer' and 'Wavtokenizer'.

Hasty Briefsbeta