Hasty Briefsbeta

Bilingual

PersonaPlex-7B: full-duplex voice model that listens and talks at the same time

9 days ago
  • #conversational AI
  • #real-time
  • #speech-to-speech
  • PersonaPlex is a real-time speech-to-speech conversational model that performs streaming speech understanding and generation.
  • It operates on continuous audio encoded with a neural codec, predicting both text and audio tokens autoregressively.
  • Supports natural conversational dynamics like interruptions, barge-ins, overlaps, and rapid turn-taking.
  • Runs in a dual-stream configuration allowing concurrent listening and speaking.
  • Conditioned on voice and text prompts to define vocal characteristics, speaking style, and persona attributes.
  • Ready for commercial use with global deployment.
  • Model architecture is based on Transformer (Moshi) with 7B parameters.
  • Inputs include text prompts and audio (24kHz sample rate), outputs include text and audio responses.
  • Trained on Fisher English dataset (less than 10,000 hours) and evaluated on FullDuplexBench benchmark.
  • Outperforms other conversational AI systems in dynamics, latency, and task adherence.
  • Ethical considerations include bias, explainability, safety, security, and privacy.