Open (Apache 2.0) TTS model for streaming conversational audio in realtime

17 days ago

Copy Link

Dia2 is a streaming dialogue TTS model by Nari Labs, capable of real-time audio generation as input is received.
Supports conditional generation on audio for natural conversations, with model checkpoints available (1B, 2B).
Features include Bonsai (JAX) implementation, Dia2 TTS Server for real streaming, and Sori, a Rust-based speech-to-speech engine.
Requires CUDA 12.8+ drivers and installation via uv, with commands executed through 'uv run'.
Includes CLI for audio generation, supporting conditional generation with speaker prefixes for conversational context.
Offers Gradio for easy usage, with detailed generation configurations and output options.
Licensed under Apache 2.0, with strict prohibitions against misuse, including identity misuse, deceptive content, and illegal activities.
Acknowledgments to TPU Research Cloud, KyutaiTTS, and Sesame for inspiration and compute resources.

Hasty Briefsbeta