Meta Omnilingual ASR: Advancing Automatic Speech Recognition for 1600 Languages

12 days ago

Copy Link

Meta's FAIR team introduces Omnilingual ASR, supporting over 1,600 languages, including 500 low-resource languages.
Omnilingual wav2vec 2.0 is open-sourced, a 7B-parameter model for multilingual speech representation.
The Omnilingual ASR Corpus is released, featuring transcribed speech in 350 underserved languages.
Two architectural variants introduced: a scaled wav2vec 2.0 encoder and two decoder variants for character tokens.
LLM-ASR achieves state-of-the-art performance with character error rates below 10 for 78% of languages.
In-context learning allows transcription of unsupported languages with minimal audio-text samples.
A suite of models released, from lightweight 300M versions to powerful 7B models, under Apache 2.0 license.
Collaboration with global partners and local communities to collect and transcribe underrepresented languages.
Omnilingual ASR Corpus is the largest ultra-low-resource spontaneous ASR dataset available.

Hasty Briefsbeta