CorentinJ: Real-Time Voice Cloning

6 hours ago

Copy Link

Implementation of SV2TTS (Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis) with a real-time vocoder.
SV2TTS framework consists of three stages: creating a digital voice representation, and using it to generate speech from text.
Video demonstration available with links to related papers and implementations.
Repository is outdated compared to current SaaS solutions; suggestions for open-source alternatives provided.
Supports both Windows and Linux; GPU recommended but not mandatory.
Python 3.7 recommended; setup instructions include installing ffmpeg, PyTorch, and other dependencies.
Pretrained models download automatically; manual download option available.
Testing configuration with `demo_cli.py` recommended before proceeding.
LibriSpeech/train-clean-100 dataset recommended for initial testing; other datasets supported.
Toolbox can be run with `demo_toolbox.py`; troubleshooting tips provided for common issues.

Hasty Briefsbeta