Nvidia releases open dataset, 2 models for multilingual speech AI

9 months ago

NVIDIA introduces a new dataset and models supporting 25 European languages for AI speech recognition and translation.
Granary, an open-source multilingual speech dataset, contains around a million hours of audio for AI training.
NVIDIA Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models offer high-quality transcription and translation with optimized performance for different tasks.
Granary addresses data scarcity by enhancing public speech data without human annotation, supporting underrepresented languages.
The new models and dataset are available on Hugging Face, with the methodology shared to accelerate speech AI innovation.

Hasty Briefsbeta