Open source speech foundation model that runs locally on CPU in real-time

11 hours ago

Copy Link

NeuTTS Air is a state-of-the-art, on-device TTS speech language model with instant voice cloning.
Built off a 0.5B LLM backbone, it offers natural-sounding speech, real-time performance, and built-in security.
Key features include best-in-class realism, on-device deployment optimization, and instant voice cloning with as little as 3 seconds of audio.
Model details highlight its lightweight yet capable Qwen 0.5B backbone, proprietary NeuCodec audio codec, and GGML format for efficient on-device inference.
Installation involves cloning the Git repo, installing espeak, and Python dependencies.
Basic usage includes synthesizing speech with reference audio and text inputs.
For optimal performance, reference audio should be mono, 16-44 kHz, 3–15 seconds, clean, and natural.
Every generated audio file includes a Perth Watermarker for responsibility.
Disclaimer advises against misuse of the model.

Hasty Briefsbeta