NVIDIA launches Cosmos 3, the first fully open omnimodel for physical AI, at GTC Taipei
NVIDIA unveiled Cosmos 3 on May 31, 2026 at GTC Taipei — an open world foundation model for physical AI capable of natively understanding and generating text, images, video, ambient sound, and actions in a single model. The announcement represents the most ambitious open-source physical AI model NVIDIA has released to date, positioned as shared infrastructure for the robotics, autonomous vehicle, and vision AI industries.
What's new
Cosmos 3 is built on a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. This design lets the model understand object interactions, physics, motion, and spatial-temporal relationships before generating outputs — enabling physics-accurate simulation and prediction rather than pattern-matching alone.
Three model variants were announced:
- Cosmos 3 Super — Available now; the full-size model for demanding physical AI workloads
- Cosmos 3 Nano — Available now; a compact variant optimized for edge deployment
- Cosmos 3 Edge — Coming soon; designed for real-time inference at the edge
Key capabilities:
- Reduces physical AI training cycles from months to days by generating synthetic training data at scale
- Functions as a vision language model, world model, video foundation model, or backbone for action models
- Native understanding of actions — not just perception — enables end-to-end training pipelines for robotics
The launch includes the Cosmos Coalition, a group of early adopters and integration partners: Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI.
Context
Cosmos 3 is the successor to NVIDIA's original Cosmos world foundation models, which debuted at CES 2025. The earlier generation established the world model concept for physical AI but was not fully open-source across all modalities. Cosmos 3 closes that gap, releasing as a true open omnimodel that any organization can fine-tune on their own hardware, simulation environments, or sensor data.
The timing follows NVIDIA's broader push to position itself at the infrastructure layer of the physical AI stack. NVIDIA's Isaac robotics platform, DRIVE Orin autonomous vehicle chip, and Cosmos 3 together form a vertically integrated offering: chips, simulation, and foundation models as a bundled solution for OEMs and robotics developers.
Black Forest Labs and Runway — both specialists in generative video — joining the Cosmos Coalition signals that the video generation world is increasingly overlapping with robotics simulation. The same models that generate cinematic video can, with the right training, learn physical dynamics.
Why it matters
Physical AI — models that understand and act in the physical world — is the frontier that frontier AI labs have not yet cracked. Large language models trained on text handle language tasks well but consistently fail at spatial reasoning, physics prediction, and motor control. Cosmos 3 is a direct attempt to close that gap with a foundation model purpose-built for embodied intelligence.
Jensen Huang declared that the big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning, vision and world models. That claim will be tested by how quickly Cosmos 3's partners ship products built on it — but the open release strategy means the robotics research community can now run, fine-tune, and benchmark the model without waiting for API access.
For NVIDIA, open-sourcing Cosmos 3 serves a clear strategic purpose: every developer who uses the model trains on NVIDIA GPUs, and every commercial deployment likely runs on NVIDIA hardware. The model is the loss leader; the silicon is the business.
Corroborating sources
- Nvidianews.nvidia
https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai
“The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.”