Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

9 months ago

Kitten TTS is a revolutionary, ultra-lightweight text-to-speech model with only 15M parameters and under 25MB in size.
It runs efficiently on CPUs without requiring GPUs, making it accessible for low-power devices like Raspberry Pi and smartphones.
The model includes multiple expressive voices (4 female and 4 male) right out of the box, offering versatility for various applications.
Kitten TTS is optimized for real-time speech synthesis, making it ideal for responsive chatbots, voice assistants, and accessibility tools.
It is open-source under the Apache 2.0 license, allowing free use for both personal and commercial projects.
The model is currently English-only, but multilingual support is planned for future releases.
Kitten TTS is compared favorably against other lightweight TTS models like Piper TTS and Kokoro TTS, offering better size-to-quality ratios.
Potential applications include edge AI, privacy-focused IoT devices, accessibility tools, and indie development projects.
The model is still in developer preview, with some minor quality issues, but future updates aim to improve performance and expand features.

Hasty Briefsbeta