CorentinJ: Real-Time Voice Cloning
6 hours ago
- #Deep Learning
- #SV2TTS
- #Text-To-Speech
- Implementation of SV2TTS (Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis) with a real-time vocoder.
- SV2TTS framework consists of three stages: creating a digital voice representation, and using it to generate speech from text.
- Video demonstration available with links to related papers and implementations.
- Repository is outdated compared to current SaaS solutions; suggestions for open-source alternatives provided.
- Supports both Windows and Linux; GPU recommended but not mandatory.
- Python 3.7 recommended; setup instructions include installing ffmpeg, PyTorch, and other dependencies.
- Pretrained models download automatically; manual download option available.
- Testing configuration with `demo_cli.py` recommended before proceeding.
- LibriSpeech/train-clean-100 dataset recommended for initial testing; other datasets supported.
- Toolbox can be run with `demo_toolbox.py`; troubleshooting tips provided for common issues.