Neutts-air – open-source, on device TTS

6 months ago

NeuTTS Air is the world’s first super-realistic, on-device TTS speech language model with instant voice cloning.
Built off a 0.5B LLM backbone, it offers natural-sounding speech, real-time performance, and built-in security.
Features include best-in-class realism, on-device deployment optimization, and instant voice cloning with just 3 seconds of audio.
Supported languages include English, with a neural audio codec (NeuCodec) for high-quality audio at low bitrates.
Available in GGML format for efficient on-device inference, with real-time generation on mid-range devices.
Installation requires espeak and Python dependencies, with optional support for GGUF models and ONNX decoder.
NeuTTS Air requires a reference audio sample and text string to synthesize speech in the style of the reference.
Optimal performance tips include using GGUF model backbones, pre-encoding references, and using the ONNX codec decoder.
Generated audio files include a perceptual watermark for security and ethical use.

Hasty Briefsbeta