Open (Apache 2.0) TTS model for streaming conversational audio in realtime
17 days ago
- #TTS
- #Real-time
- #AI
- Dia2 is a streaming dialogue TTS model by Nari Labs, capable of real-time audio generation as input is received.
- Supports conditional generation on audio for natural conversations, with model checkpoints available (1B, 2B).
- Features include Bonsai (JAX) implementation, Dia2 TTS Server for real streaming, and Sori, a Rust-based speech-to-speech engine.
- Requires CUDA 12.8+ drivers and installation via uv, with commands executed through 'uv run'.
- Includes CLI for audio generation, supporting conditional generation with speaker prefixes for conversational context.
- Offers Gradio for easy usage, with detailed generation configurations and output options.
- Licensed under Apache 2.0, with strict prohibitions against misuse, including identity misuse, deceptive content, and illegal activities.
- Acknowledgments to TPU Research Cloud, KyutaiTTS, and Sesame for inspiration and compute resources.