Hasty Briefsbeta

Bilingual

Show HN: Dia, an open-weights TTS model for generating realistic dialogue

a year ago
  • #AI
  • #text-to-speech
  • #dialogue-generation
  • Dia is a 1.6B parameter text-to-speech model by Nari Labs, generating realistic dialogue from transcripts.
  • Features include emotion/tone control, nonverbal sounds (laughter, coughing), and audio conditioning.
  • Pretrained model checkpoints and inference code are available on Hugging Face.
  • Demo page compares Dia to ElevenLabs Studio and Sesame CSM-1B.
  • Community support via Discord; waitlist for larger model access.
  • Installation via GitHub: clone repo, set up environment, and run Gradio UI.
  • Python code example for generating dialogue audio with Dia.
  • Supports GPUs (PyTorch 2.0+, CUDA 12.6); CPU support coming soon.
  • Real-time audio generation on enterprise GPUs; slower on older GPUs.
  • Full version requires ~10GB VRAM; quantized version planned.
  • Strict usage restrictions: no identity misuse, deceptive content, or illegal activities.
  • Future plans: Docker support, inference optimization, quantization.
  • Team of 1 full-time and 1 part-time engineers; contributions welcome.
  • Acknowledgments: Google TPU Research Cloud, SoundStorm, Parakeet, Descript Audio Codec.