ReleaseGoogleVerified

Google introduces Gemini Omni: a video generation model that accepts any input and edits through conversation

ListenMay 29, 2026published Jun 5, 2026

Google launched Gemini Omni on May 29, 2026 — a new model built specifically for video generation from any input type, including images, audio, existing video, and text. Its primary differentiator is a natural language editing interface: users direct the model conversationally, with each instruction building on the previous result while the model maintains character consistency and physical accuracy.

What's new

Gemini Omni is described by Google as "our new model that can create anything from any input, starting with video." It accepts the full range of multimodal inputs and outputs video.

Key capabilities:

Conversational video editing: users "edit your videos through conversation" — each instruction in a session builds sequentially on previous edits
Character consistency: maintains character appearance and physical behavior across conversational edits
Multimodal input: accepts images, audio clips, existing video, and text prompts

Availability at launch:

Gemini AI Plus, Pro, and Ultra subscribers via the Gemini app, globally
YouTube Shorts and YouTube Create App users at no cost
Developer and enterprise API access: announced as coming in the weeks following launch

Context

Gemini Omni is distinct from the Gemini 3.5 family announced at the same Google I/O event. Gemini 3.5 Flash targets agentic and coding tasks; Gemini Omni is purpose-built for video generation and editing. Google positioned the two as addressing different use cases, with Gemini Omni aimed at creators and content workflows rather than developer automation.

The conversational editing approach marks a departure from most video generation tools, which typically require re-prompting from scratch to apply changes or use a separate, traditional editing interface.

Why it matters

Natural language video editing at this level of conversational continuity — where each instruction builds on the last without losing earlier decisions — moves consumer video production closer to what was previously possible only with professional editing software and skilled operators.

For developers, the API announcement matters as much as the consumer launch. Gemini Omni will be available as an API surface, meaning it can be embedded into third-party applications and production pipelines. The initial consumer rollout through YouTube Shorts provides large-scale exposure, but the real long-term surface area is developer integrations.

The no-cost access for YouTube Shorts and YouTube Create App users gives Google immediate distribution at scale — likely hundreds of millions of potential users — which will generate usage data and feedback loops that inform the model's development before the broader API becomes available.

Corroborating sources

Blog
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni-3-5-videos/
“Gemini Omni is our new model that can create anything from any input, starting with video.”

What's new

Gemini Omni is described by Google as "our new model that can create anything from any input, starting with video." It accepts the full range of multimodal inputs and outputs video.

Key capabilities:

Conversational video editing: users "edit your videos through conversation" — each instruction in a session builds sequentially on previous edits

Character consistency: maintains character appearance and physical behavior across conversational edits

Multimodal input: accepts images, audio clips, existing video, and text prompts

Availability at launch:

Gemini AI Plus, Pro, and Ultra subscribers via the Gemini app, globally

YouTube Shorts and YouTube Create App users at no cost

Developer and enterprise API access: announced as coming in the weeks following launch

Context

Why it matters