NVIDIA ships Nemotron 3 Ultra, a 550B open MoE frontier model, on June 4

ListenJun 4, 2026published Jun 4, 2026

NVIDIA's largest open Nemotron model, Nemotron 3 Ultra, is set to land on June 4, 2026 as a NIM microservice and on the major open-weights hubs. NVIDIA framed Ultra at its May 31 software-and-agents press release as a 550-billion-parameter mixture-of-experts model targeted at long-running agentic workloads, with sharper throughput and lower per-token cost than the open frontier models it is positioned against.

What's new

Nemotron 3 Ultra is the top tier of NVIDIA's Nemotron 3 open family, which started with the 30B Nano in December 2025 and was extended with the 120B / 12B-active Super in March 2026. Ultra is the family's first model squarely aimed at frontier-class agent work.

Size and architecture. Per NVIDIA's announcement: "NVIDIA Nemotron 3 Ultra is a 550-billion-parameter mixture-of-experts model." It is positioned to deliver "frontier-level intelligence to long-running agents across coding, research and enterprise workflows."
Performance and cost claims. NVIDIA reports "up to 5x faster inference and up to 30% lower cost compared with open frontier models in its class." The 5x throughput claim ties to the NVFP4 numerical format on NVIDIA Blackwell, where NVIDIA has previously said it lifts open-weights throughput substantially over FP8 on Hopper.
Agent harness coverage. Ultra is "post-trained for leading agent platforms and harnesses," including "Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands and OpenCode."
Availability surfaces. NVIDIA states Ultra is "expected to be available on June 4 via Hugging Face, ModelScope, OpenRouter and build.nvidia.com as NVIDIA NIM microservices," alongside the broader ecosystem of NVIDIA Cloud Partners and inference services.

Context

Nemotron 3 has been a staged rollout. Nano (~3B active / ~30B total) shipped on Hugging Face in December 2025 with open weights, datasets, and recipes. Super (120B total, 12B active) followed in March 2026 with the hybrid Mamba-Transformer MoE architecture and a claimed 5x throughput gain over the prior Nemotron Super on B200. Ultra is the long-promised flagship: NVIDIA had committed in December 2025 to ship Super and Ultra in the first half of 2026, and the May 31 release sets a hard date.

The model also lands inside a coordinated NVIDIA week. May 31 brought the Vera CPU announcement and Vera Rubin's move into full production; June 1-5 covered the GTC Taipei / COMPUTEX 2026 keynote with Cosmos 3, Isaac Sim 6.0, RTX Spark, DGX Station for Windows, and a deep Microsoft Build alignment for agentic stacks. Nemotron 3 Ultra is the software counterpart to that hardware push — the open model NVIDIA wants enterprises running on its silicon.

Why it matters

Open frontier models in the 400B-600B MoE band — DeepSeek's V3.x and R1 successors, Mistral's Large-class MoEs, Meta's Llama 4 family — define a real procurement decision for enterprises that want frontier capability without locking into a closed API. A 550B MoE that NVIDIA is willing to claim is 30% cheaper to run than its open peers, and that ships pre-tuned for the agent harnesses enterprises are actually deploying (OpenHands, OpenCode, LangChain Deep Agents), is targeted directly at that bake-off.

The 5x throughput claim against open frontier peers is the number to watch in independent reproductions. If it holds up on standard agentic evals — and especially on the long-context, multi-tool-call profiles where MoEs traditionally lose efficiency — Ultra becomes a credible default for self-hosted agent deployment and tightens NVIDIA's grip on the open-model + Blackwell stack. Independent benchmarks from the OpenRouter and Hugging Face surfaces over the next two weeks will be the cleanest read.

Corroborating sources

Nvidianews.nvidia
https://nvidianews.nvidia.com/news/enterprise-software-leaders-build-ai-agents-with-nvidia
“NVIDIA Nemotron 3 Ultra is a 550-billion-parameter mixture-of-experts model”

What's new

Size and architecture. Per NVIDIA's announcement: "NVIDIA Nemotron 3 Ultra is a 550-billion-parameter mixture-of-experts model." It is positioned to deliver "frontier-level intelligence to long-running agents across coding, research and enterprise workflows."

Performance and cost claims. NVIDIA reports "up to 5x faster inference and up to 30% lower cost compared with open frontier models in its class." The 5x throughput claim ties to the NVFP4 numerical format on NVIDIA Blackwell, where NVIDIA has previously said it lifts open-weights throughput substantially over FP8 on Hopper.

Agent harness coverage. Ultra is "post-trained for leading agent platforms and harnesses," including "Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands and OpenCode."

Availability surfaces. NVIDIA states Ultra is "expected to be available on June 4 via Hugging Face, ModelScope, OpenRouter and build.nvidia.com as NVIDIA NIM microservices," alongside the broader ecosystem of NVIDIA Cloud Partners and inference services.

Context

Why it matters