Hasty Briefsbeta

Bilingual

High-Fidelity Simultaneous Speech-to-Speech Translation

10 months ago
  • #natural language processing
  • #machine learning
  • #speech translation
  • Hibiki is a decoder-only model for simultaneous speech translation.
  • It uses a multistream language model to process source and target speech synchronously.
  • The model jointly produces text and audio tokens for speech-to-text and speech-to-speech translation.
  • A weakly-supervised method leverages perplexity of an off-the-shelf text translation system to identify optimal delays.
  • Hibiki performs adaptive, simultaneous speech translation with vanilla temperature sampling.
  • It achieves state-of-the-art performance in translation quality, speaker fidelity, and naturalness.
  • The model is compatible with batched translation and real-time on-device deployment.