Show HN: Dia, an open-weights TTS model for generating realistic dialogue
a year ago
- #AI
- #text-to-speech
- #dialogue-generation
- Dia is a 1.6B parameter text-to-speech model by Nari Labs, generating realistic dialogue from transcripts.
- Features include emotion/tone control, nonverbal sounds (laughter, coughing), and audio conditioning.
- Pretrained model checkpoints and inference code are available on Hugging Face.
- Demo page compares Dia to ElevenLabs Studio and Sesame CSM-1B.
- Community support via Discord; waitlist for larger model access.
- Installation via GitHub: clone repo, set up environment, and run Gradio UI.
- Python code example for generating dialogue audio with Dia.
- Supports GPUs (PyTorch 2.0+, CUDA 12.6); CPU support coming soon.
- Real-time audio generation on enterprise GPUs; slower on older GPUs.
- Full version requires ~10GB VRAM; quantized version planned.
- Strict usage restrictions: no identity misuse, deceptive content, or illegal activities.
- Future plans: Docker support, inference optimization, quantization.
- Team of 1 full-time and 1 part-time engineers; contributions welcome.
- Acknowledgments: Google TPU Research Cloud, SoundStorm, Parakeet, Descript Audio Codec.