Gemini 3.1 Flash TTS: the next generation of expressive AI speech
10 hours ago
- #Google Gemini
- #AI Speech
- #Text-to-Speech
- Gemini 3.1 Flash TTS is a new text-to-speech model offering improved controllability, expressivity, and quality.
- It is now available in preview for developers via Gemini API and Google AI Studio, for enterprises on Vertex AI, and for Workspace users via Google Vids.
- The model achieves high speech quality with an Elo score of 1,211 on the Artificial Analysis TTS leaderboard and is noted for its blend of quality and low cost.
- It supports multi-speaker dialogue, over 70 languages, and granular control through natural language and new audio tags.
- Audio tags allow intuitive control of vocal style, pace, and delivery by embedding natural language commands in text input.
- Features include scene direction, speaker-level specificity with Audio Profiles and Director's Notes, and seamless export of parameters as API code.
- The model is built for global scale, enabling localized, expressive speech experiences across major markets.
- All generated audio is watermarked with SynthID to help detect AI-generated content and prevent misinformation.