ElevenLabs releases Dubbing v2, an AI dubbing model that preserves speaker emotion across 90+ languages
ElevenLabs on May 28, 2026, released Dubbing v2, an AI dubbing model designed to preserve the emotional performance and vocal characteristics of the original speaker across more than 90 languages. The model conditions directly on the source audio rather than a translated transcript, enabling it to capture intonation, tone, and delivery and reproduce them in each target language.
What's new
The core technical departure in Dubbing v2 is its conditioning approach. Where prior AI dubbing systems typically generated audio from a translated text transcript — losing the speaker's delivery in the process — Dubbing v2 works from the original recorded performance. According to ElevenLabs, "the emotion and performance of the original speaker carries across every language."
Key features:
- Performance preservation: Intonation, tone, and vocal delivery are captured from the source recording and reproduced in each target language.
- Natural spoken delivery: Translations are adapted for how language sounds when spoken (not written), while remaining synchronized with the original video.
- 90+ languages supported at launch.
- Availability: Immediately accessible in ElevenCreative and ElevenProductions. API access is coming soon.
- Free trial: 1 minute on the Free plan, 15 minutes on Starter, 30 minutes for Creator and above — each trial valid for 7 days.
Context
AI dubbing has struggled with a persistent quality gap: translated transcripts produce grammatically correct but tonally flat speech. The speaker's original performance — the emphasis, the pauses, the emotional color — is stripped out when audio is re-generated from text alone. Several companies, including HeyGen and Deepdub, have built dubbing tools, and ElevenLabs previously offered an earlier dubbing product in ElevenCreative and ElevenProductions.
Dubbing v2 represents ElevenLabs' technical answer to this gap. By conditioning on the source audio performance rather than its text transcript, the model attempts to preserve what the voice actually communicates beyond the literal words.
Why it matters
The commercial case for better AI dubbing is large. Global video content — streaming platforms, YouTube, enterprise training libraries — costs significant time and money to localize. Human dubbing workflows are slow and expensive; AI dubbing tools have been limited by the performance-preservation problem Dubbing v2 targets.
ElevenLabs framed the release directly around this gap: "This solves one of the biggest unsolved problems in AI dubbing: making translated speech feel like the original person actually said it." If that claim holds up at production scale, it changes the economics of localization for independent creators and mid-size production teams. Studios and broadcasters will require higher quality bars, but ElevenProductions is explicitly positioned for that market.
The 90-plus-language coverage at launch is notable. Most AI dubbing tools support a narrower set of languages and expand gradually. Broad coverage from day one positions ElevenLabs to serve markets beyond English and Western Europe — relevant for companies trying to reach audiences in Asia, Latin America, and the Middle East, where demand for localized video content is substantial.
Corroborating sources
- Elevenlabs
https://elevenlabs.io/blog/introducing-dubbing-v2
“This solves one of the biggest unsolved problems in AI dubbing: making translated speech feel like the original person actually said it.”