NVIDIA Nemotron 3 Ultra Lands on Amazon SageMaker JumpStart with 1M-Token Context and 5x Agentic Inference Speed

ListenJun 4, 2026published Jun 7, 2026

Amazon Web Services on June 4, 2026 added NVIDIA Nemotron 3 Ultra to SageMaker JumpStart, giving cloud developers one-click access to a 550-billion-parameter reasoning model designed specifically for long-running agentic workflows.

What's new

NVIDIA Nemotron 3 Ultra is now deployable on Amazon SageMaker JumpStart through a single-click experience, with no custom infrastructure required. The AWS announcement describes it as "an open model built for frontier reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agentic workloads."

Key specifications:

550 billion total parameters, 55 billion active per forward pass
Hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture — combining state-space efficiency with attention-based reasoning
Up to 1 million tokens of context length in NVFP4 precision
Text-in, text-out modality
5x faster inference than dense alternatives for agentic workloads
Up to 30% lower cost per agentic request

SageMaker JumpStart handles endpoint provisioning, monitoring, and scaling automatically. Developers access the model via the JumpStart catalog or the AWS console.

Context

Nemotron 3 Ultra is separate from the Nemotron 3.5 Content Safety model released in early June, which targets multimodal safety classification. Ultra is NVIDIA's general-purpose reasoning model for orchestration-heavy workloads.

The model's sparse MoE design is central to its efficiency story: only 55 billion of its 550 billion parameters activate per forward pass. At million-token context lengths — where dense models would consume prohibitive GPU memory — MoE sparsity keeps throughput viable at reasonable cost.

AWS SageMaker JumpStart has become a key distribution surface for frontier reasoning models. By co-launching there, NVIDIA removes the barrier of self-managed serving infrastructure for enterprise teams who want to evaluate the model without standing up dedicated GPU clusters.

Why it matters

The release reflects a broader trend: frontier model providers are increasingly co-launching with cloud hyperscalers rather than just releasing weights. For teams building agentic pipelines — autonomous coding agents, research orchestrators, multi-step process automation — having Nemotron 3 Ultra available as a managed endpoint means they can benchmark it in production conditions the same week it launches.

The 1M-token context window matters for agentic use cases specifically because agents accumulate long conversation histories: tool call inputs and outputs, sub-agent results, and planning traces all pile up. A model that can hold that full history in context without truncation makes more coherent long-horizon decisions.

For enterprises already on AWS who want a strong open alternative to closed-API reasoning models, Nemotron 3 Ultra on SageMaker JumpStart is now a production-ready option.

Corroborating sources

Aws.amazon
https://aws.amazon.com/blogs/machine-learning/nvidia-nemotron-3-ultra-now-available-on-amazon-sagemaker-jumpstart/
“Nemotron 3 Ultra is an open model built for frontier reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agentic workloads.”

What's new

Key specifications:

550 billion total parameters, 55 billion active per forward pass

Hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture — combining state-space efficiency with attention-based reasoning

Up to 1 million tokens of context length in NVFP4 precision

Text-in, text-out modality

5x faster inference than dense alternatives for agentic workloads

Up to 30% lower cost per agentic request

SageMaker JumpStart handles endpoint provisioning, monitoring, and scaling automatically. Developers access the model via the JumpStart catalog or the AWS console.

Context

Why it matters

For enterprises already on AWS who want a strong open alternative to closed-API reasoning models, Nemotron 3 Ultra on SageMaker JumpStart is now a production-ready option.