Google releases Gemini 3.5 Live Translate, enabling real-time speech translation across 70+ languages in Meet, Google Translate, and developer APIs
Google has launched Gemini 3.5 Live Translate, a speech-to-speech translation model that handles more than 70 languages and 2,000 language pair combinations in real time. The model enters public preview for developers via the Gemini Live API and Google AI Studio, rolls out immediately in Google Translate on Android and iOS, and begins a private enterprise preview in Google Meet that expands the product's translation support from 5 languages to more than 70.
What's new
Google describes Gemini 3.5 Live Translate as "our latest audio model for live speech-to-speech translation." The design is continuous rather than turn-based: the model "generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync" — which means it does not require speakers to pause between sentences before translation begins.
Deployment surfaces and status:
- Developers: Public preview via the Gemini Live API and Google AI Studio
- Enterprise: Private preview in Google Meet starting June 2026, expanding from 5 supported languages to 70+, and removing the previous English-only limitation
- Consumers: Available immediately in the Google Translate app on Android and iOS; Android users gain a new "listening mode" that routes translations through the phone earpiece for private in-ear translation
The model targets "smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch" — a deliberate design priority to maintain vocal identity across language boundaries rather than flattening it to a neutral synthetic voice.
All audio output is watermarked with SynthID, Google's imperceptible AI-content detection signal.
Context
Google Meet's prior live translation feature supported only five languages and relied on infrastructure separate from the Gemini family. This release consolidates everything under Gemini 3.5 and extends coverage by more than an order of magnitude. The 2,000+ language combination count matters particularly for non-English pairs: commercial translation tools are historically English-centric, routing, say, Thai-to-Vietnamese or Swahili-to-Portuguese through English as an intermediate step, which degrades quality. A model trained end-to-end on diverse pairs can skip that bottleneck.
The model is already in production at scale before the general release. Grab, the Southeast Asian super-app, is testing Gemini 3.5 Live Translate for driver-passenger communication across its platform, which handles more than 10 million monthly voice calls. That deployment gives Google real-world data on performance at scale before the enterprise rollout broadens.
The announcement also coincides with Anthropic's Fable 5 launch, which also features audio and vision improvements, and with OpenAI's Realtime voice model family. Live, low-latency speech translation has become a direct point of competition across the frontier lab tier.
Why it matters
Live speech translation has been a recurring demo since the early days of neural machine translation, but the gap between demo quality and production reliability has been wide. The core engineering challenge is the latency-quality tradeoff: waiting for full utterances improves accuracy; translating in real time preserves conversational flow. Gemini 3.5 Live Translate's continuous generation architecture is Google's solution — it generates partial translations as speech arrives and updates them as context accumulates.
The simultaneous deployment across consumer (Google Translate), enterprise (Google Meet), and developer (Gemini Live API) surfaces signals that Google considers the model ready for broad production use rather than limited trial. SynthID watermarking on all audio output is a governance layer that matters as real-time voice synthesis comes under increasing regulatory scrutiny in the EU AI Act framework and state-level AI disclosure laws in the US.
For developers, the public Gemini Live API preview offers a path to multilingual voice applications without assembling separate ASR, translation, and TTS components. That integration simplification — and the breadth of language coverage — is the immediate practical value.
Corroborating sources
- Blog
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-live-3-5-translate/
“smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch”