Hasty Briefsbeta

TTS Still Sucks

12 days ago
  • #TTS
  • #Open Source
  • #Podcast
  • The author prefers using open models for voice cloning and generating article transcripts for their podcast.
  • Kokoro is a top open TTS model but doesn't support voice cloning.
  • Fish Audio's S1-mini model has limitations like non-functional emotion markers and unused chunking parameters.
  • Chatterbox is another option but has character limits (1,000–2,000) and issues with longer texts.
  • The podcast generation process involves extracting text from RSS, preprocessing with an LLM, and using parallel Modal containers for TTS.
  • Improvements include availability on Spotify and better show notes with clickable links.
  • Open-source TTS models like Chatterbox have issues with speech duration and lack of control over features like emotion tags.
  • Despite advancements, open-source TTS still lags behind proprietary systems.
  • The RSS to podcast pipeline is open-source and available on GitHub.