PersonaPlex-7B: full-duplex voice model that listens and talks at the same time

9 days ago

PersonaPlex is a real-time speech-to-speech conversational model that performs streaming speech understanding and generation.
It operates on continuous audio encoded with a neural codec, predicting both text and audio tokens autoregressively.
Supports natural conversational dynamics like interruptions, barge-ins, overlaps, and rapid turn-taking.
Runs in a dual-stream configuration allowing concurrent listening and speaking.
Conditioned on voice and text prompts to define vocal characteristics, speaking style, and persona attributes.
Ready for commercial use with global deployment.
Model architecture is based on Transformer (Moshi) with 7B parameters.
Inputs include text prompts and audio (24kHz sample rate), outputs include text and audio responses.
Trained on Fisher English dataset (less than 10,000 hours) and evaluated on FullDuplexBench benchmark.
Outperforms other conversational AI systems in dynamics, latency, and task adherence.
Ethical considerations include bias, explainability, safety, security, and privacy.

Hasty Briefsbeta