Hasty Briefsbeta

Open (Apache 2.0) TTS model for streaming conversational audio in realtime

17 days ago
  • #TTS
  • #Real-time
  • #AI
  • Dia2 is a streaming dialogue TTS model by Nari Labs, capable of real-time audio generation as input is received.
  • Supports conditional generation on audio for natural conversations, with model checkpoints available (1B, 2B).
  • Features include Bonsai (JAX) implementation, Dia2 TTS Server for real streaming, and Sori, a Rust-based speech-to-speech engine.
  • Requires CUDA 12.8+ drivers and installation via uv, with commands executed through 'uv run'.
  • Includes CLI for audio generation, supporting conditional generation with speaker prefixes for conversational context.
  • Offers Gradio for easy usage, with detailed generation configurations and output options.
  • Licensed under Apache 2.0, with strict prohibitions against misuse, including identity misuse, deceptive content, and illegal activities.
  • Acknowledgments to TPU Research Cloud, KyutaiTTS, and Sesame for inspiration and compute resources.