Meta Omnilingual ASR: Advancing Automatic Speech Recognition for 1600 Languages
12 days ago
- #Multilingual
- #AI
- #Speech Recognition
- Meta's FAIR team introduces Omnilingual ASR, supporting over 1,600 languages, including 500 low-resource languages.
- Omnilingual wav2vec 2.0 is open-sourced, a 7B-parameter model for multilingual speech representation.
- The Omnilingual ASR Corpus is released, featuring transcribed speech in 350 underserved languages.
- Two architectural variants introduced: a scaled wav2vec 2.0 encoder and two decoder variants for character tokens.
- LLM-ASR achieves state-of-the-art performance with character error rates below 10 for 78% of languages.
- In-context learning allows transcription of unsupported languages with minimal audio-text samples.
- A suite of models released, from lightweight 300M versions to powerful 7B models, under Apache 2.0 license.
- Collaboration with global partners and local communities to collect and transcribe underrepresented languages.
- Omnilingual ASR Corpus is the largest ultra-low-resource spontaneous ASR dataset available.