Hasty Briefsbeta

Open source speech foundation model that runs locally on CPU in real-time

9 hours ago
  • #TTS
  • #Voice AI
  • #On-device
  • NeuTTS Air is a state-of-the-art, on-device TTS speech language model with instant voice cloning.
  • Built off a 0.5B LLM backbone, it offers natural-sounding speech, real-time performance, and built-in security.
  • Key features include best-in-class realism, on-device deployment optimization, and instant voice cloning with as little as 3 seconds of audio.
  • Model details highlight its lightweight yet capable Qwen 0.5B backbone, proprietary NeuCodec audio codec, and GGML format for efficient on-device inference.
  • Installation involves cloning the Git repo, installing espeak, and Python dependencies.
  • Basic usage includes synthesizing speech with reference audio and text inputs.
  • For optimal performance, reference audio should be mono, 16-44 kHz, 3–15 seconds, clean, and natural.
  • Every generated audio file includes a Perth Watermarker for responsibility.
  • Disclaimer advises against misuse of the model.