Hasty Briefsbeta

VibeVoice-ASR: speech-to-text model designed to handle 60-minute long-form audio

5 days ago
  • #multilingual
  • #ASR
  • #speech-to-text
  • VibeVoice-ASR is a unified speech-to-text model for 60-minute long-form audio.
  • Generates structured transcriptions with Who, When, and What details.
  • Supports Customized Hotwords and over 50 languages.
  • Features 60-minute single-pass processing without slicing audio.
  • Includes speaker tracking, semantic coherence, and multilingual support.
  • Jointly performs ASR, diarization, and timestamping.
  • Open-source with MIT License, developed by Microsoft Research.