Gemini 3.1 Flash TTS: the next generation of expressive AI speech

8 hours ago

Gemini 3.1 Flash TTS is a new text-to-speech model offering improved controllability, expressivity, and quality.
It is now available in preview for developers via Gemini API and Google AI Studio, for enterprises on Vertex AI, and for Workspace users via Google Vids.
The model achieves high speech quality with an Elo score of 1,211 on the Artificial Analysis TTS leaderboard and is noted for its blend of quality and low cost.
It supports multi-speaker dialogue, over 70 languages, and granular control through natural language and new audio tags.
Audio tags allow intuitive control of vocal style, pace, and delivery by embedding natural language commands in text input.
Features include scene direction, speaker-level specificity with Audio Profiles and Director's Notes, and seamless export of parameters as API code.
The model is built for global scale, enabling localized, expressive speech experiences across major markets.
All generated audio is watermarked with SynthID to help detect AI-generated content and prevent misinformation.

Hasty Briefsbeta