Hasty Briefsbeta

Bilingual

VibeVoice-ASR: speech-to-text model designed to handle 60-minute long-form audio

a month ago

#multilingual
#ASR
#speech-to-text

VibeVoice-ASR is a unified speech-to-text model for 60-minute long-form audio.
Generates structured transcriptions with Who, When, and What details.
Supports Customized Hotwords and over 50 languages.
Features 60-minute single-pass processing without slicing audio.
Includes speaker tracking, semantic coherence, and multilingual support.
Jointly performs ASR, diarization, and timestamping.
Open-source with MIT License, developed by Microsoft Research.