Hasty Briefsbeta

Bilingual

Voxtral – Frontier open source speech understanding models

10 months ago
  • #AI-models
  • #speech-recognition
  • #open-source
  • Introduction of Voxtral, frontier open-source speech understanding models.
  • Voice as the original and most natural human-computer interface.
  • Current limitations of voice systems: unreliable, proprietary, and brittle.
  • Voxtral models aim to bridge the gap with exceptional transcription, deep understanding, multilingual fluency, and open deployment.
  • Available in two sizes: 24B for production-scale and 3B for local/edge deployments, both under Apache 2.0 license.
  • Voxtral offers state-of-the-art accuracy and semantic understanding at less than half the price of comparable APIs.
  • Capabilities include long-form context (up to 30-40 minutes), built-in Q&A and summarization, multilingual support, and function-calling from voice.
  • Benchmarks show Voxtral outperforms leading models like Whisper, GPT-4o mini, and Gemini 2.5 Flash in transcription and understanding.
  • Free options to try: download locally, use the API, or test on Le Chat's voice mode.
  • Enterprise features include private deployment, domain-specific fine-tuning, advanced context, and dedicated integration support.
  • Upcoming features: speaker segmentation, audio markups, word-level timestamps, non-speech audio recognition.
  • Live webinar on Aug 6 to showcase voice-powered agents.
  • Hiring for research scientists and engineers to advance voice interface technology.