TTS Still Sucks

12 days ago

Copy Link

The author prefers using open models for voice cloning and generating article transcripts for their podcast.
Kokoro is a top open TTS model but doesn't support voice cloning.
Fish Audio's S1-mini model has limitations like non-functional emotion markers and unused chunking parameters.
Chatterbox is another option but has character limits (1,000–2,000) and issues with longer texts.
The podcast generation process involves extracting text from RSS, preprocessing with an LLM, and using parallel Modal containers for TTS.
Improvements include availability on Spotify and better show notes with clickable links.
Open-source TTS models like Chatterbox have issues with speech duration and lack of control over features like emotion tags.
Despite advancements, open-source TTS still lags behind proprietary systems.
The RSS to podcast pipeline is open-source and available on GitHub.

Hasty Briefsbeta