Open source speech foundation model that runs locally on CPU in real-time
11 hours ago
- #TTS
- #Voice AI
- #On-device
- NeuTTS Air is a state-of-the-art, on-device TTS speech language model with instant voice cloning.
- Built off a 0.5B LLM backbone, it offers natural-sounding speech, real-time performance, and built-in security.
- Key features include best-in-class realism, on-device deployment optimization, and instant voice cloning with as little as 3 seconds of audio.
- Model details highlight its lightweight yet capable Qwen 0.5B backbone, proprietary NeuCodec audio codec, and GGML format for efficient on-device inference.
- Installation involves cloning the Git repo, installing espeak, and Python dependencies.
- Basic usage includes synthesizing speech with reference audio and text inputs.
- For optimal performance, reference audio should be mono, 16-44 kHz, 3–15 seconds, clean, and natural.
- Every generated audio file includes a Perth Watermarker for responsibility.
- Disclaimer advises against misuse of the model.