Chatterbox, Resemble AI's production-grade open source TTS model

a year ago

Resemble AI introduces Chatterbox, its first production-grade open source TTS model under MIT license.
Chatterbox benchmarks favorably against closed-source systems like ElevenLabs in side-by-side evaluations.
Supports emotion exaggeration control, a unique feature for open source TTS models.
Available for use in memes, videos, games, and AI agents via Hugging Face Gradio app.
Offers a competitively priced TTS service for scaling or tuning with ultra-low latency (<200ms).
Features include SoTA zeroshot TTS, 0.5B Llama backbone, and watermarked outputs.
Includes easy voice conversion scripts and alignment-informed inference for stability.
Trained on 0.5M hours of cleaned data.
Provides usage tips for general and expressive/dramatic speech settings.
Includes Python code examples for generating TTS with optional different voice prompts.
All generated audio includes imperceptible neural watermarks for security.
Encourages community engagement via Discord while emphasizing ethical use.

Hasty Briefsbeta