Nvidia: Natural Conversational AI with Any Role and Voice
21 days ago
- #AI
- #NVIDIA
- #ConversationalAI
- NVIDIA PersonaPlex is a full-duplex conversational AI model that allows customization of voice and role while maintaining natural conversation dynamics.
- It handles interruptions, backchannels, and authentic conversational rhythm, making interactions feel genuinely human.
- PersonaPlex uses a hybrid prompting architecture with voice and text prompts to define conversational behavior.
- The model is built on the Moshi architecture with 7 billion parameters and operates at a 24kHz sample rate.
- Training data includes real conversations from the Fisher English corpus and synthetic dialogues for assistant and customer service roles.
- Key findings include efficient specialization from pretrained foundations, disentangled speech naturalness, and emergent generalization beyond training domains.
- PersonaPlex outperforms other systems on conversational dynamics, latency, and task adherence in benchmarks like FullDuplexBench and ServiceDuplexBench.
- The model's code and weights are released under MIT License and NVIDIA Open Model License, respectively.