ReleaseGoogleVerified

Google introduces Gemini Omni: multimodal video generation from any input type

ListenMay 20, 2026published Jun 6, 2026

Google introduced Gemini Omni at Google I/O 2026, a new multimodal model that accepts images, audio, video, and text in any combination and generates video as its primary output. It is Google's first generally available model designed around native video generation, with image and audio output modalities planned for future releases.

What's new

Inputs: images, audio, video, and text — any combination
Primary output: high-quality video generation
Planned outputs: image and audio (not yet available at launch)
Scene consistency: characters stay consistent across edits, physics hold across frames, scenes maintain continuity from previous context
Physics reasoning: the model predicts what should happen next in a scene, drawing on Gemini's broader knowledge base
Video editing via natural language: describe changes conversationally rather than through technical controls
Digital avatar creation: generate videos with custom digital personas
SynthID watermark: all outputs include an imperceptible verification watermark, embedded automatically
Availability at launch: Google AI Plus, Pro, and Ultra subscribers (Gemini app and Google Flow); YouTube Shorts and YouTube Create App users at no cost
Coming soon: developer and enterprise API access

Context

Gemini Omni enters a video generation market that includes Runway (Gen-3 Alpha), OpenAI's Sora, and Kling from Kuaishou. Google's differentiation is the depth of multimodal input support and the integration of Gemini's existing language and reasoning capabilities into the generation process. The physics reasoning claim — that the model "reasons about what should happen next" — targets a known weakness of current video generation tools, which frequently produce motion that violates basic physical expectations.

The SynthID watermark is a continuation of Google DeepMind's content authentication policy. All Google AI-generated media carries SynthID markers; the policy extends to Gemini Omni at launch rather than being added retroactively.

The no-cost YouTube distribution gives Gemini Omni a path to scale that other video generation models have not had at launch. YouTube Shorts creators represent a large, immediately addressable audience for AI video tooling.

Why it matters

True any-input-to-video generation from a single model — mixing images, audio, text, and existing video — is a capability step beyond what current video generation tools offer, which typically require text-only or single-image input. The scene consistency claim, specifically that characters and physics persist across edits, addresses the most common production complaint about AI video: incoherent motion and character drift that makes longer generated videos unusable without significant post-editing.

The YouTube distribution story has real implications for creator adoption: unlike Sora (API-first, paid) and Runway (subscription, professional-oriented), Gemini Omni reaches YouTube Shorts creators at no cost on day one. If API access follows quickly, Gemini Omni could establish itself as the default AI video infrastructure for teams already embedded in Google Cloud and YouTube's ecosystem.

Corroborating sources

Blog
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/
“Gemini Omni is our new model that can create anything from any input — starting with video.”

What's new

Inputs: images, audio, video, and text — any combination

Primary output: high-quality video generation

Planned outputs: image and audio (not yet available at launch)

Scene consistency: characters stay consistent across edits, physics hold across frames, scenes maintain continuity from previous context

Physics reasoning: the model predicts what should happen next in a scene, drawing on Gemini's broader knowledge base

Video editing via natural language: describe changes conversationally rather than through technical controls

Digital avatar creation: generate videos with custom digital personas

SynthID watermark: all outputs include an imperceptible verification watermark, embedded automatically

Availability at launch: Google AI Plus, Pro, and Ultra subscribers (Gemini app and Google Flow); YouTube Shorts and YouTube Create App users at no cost

Coming soon: developer and enterprise API access

Context

Why it matters