Hasty Briefsbeta

Bilingual

ByteDance Releases MegaTTS3

a year ago
  • #VoiceCloning
  • #AI
  • #TTS
  • Lightweight and efficient TTS Diffusion Transformer with only 0.45B parameters.
  • Supports ultra high-quality voice cloning for both Chinese and English, including code-switching.
  • Offers controllable features like accent intensity and fine-grained pronunciation adjustments.
  • Project released on 2025-03-22 with detailed setup instructions for Linux, Windows, and Docker.
  • Pretrained checkpoints available on Google Drive and Huggingface; WaveVAE encoder parameters not included for security.
  • Command-line and Web UI usage examples provided for standard and accented TTS.
  • Includes additional submodules for speech-text alignment, graphme-to-phoneme conversion, and waveform VAE.
  • Security vulnerabilities should be reported via Bytedance Security; project licensed under Apache-2.0.
  • Based on research papers: 'Sparse Alignment Enhanced Latent Diffusion Transformer' and 'Wavtokenizer'.