NVIDIA Nemotron 3 Ultra Lands on Amazon SageMaker JumpStart with 1M-Token Context and 5x Agentic Inference Speed
Amazon Web Services on June 4, 2026 added NVIDIA Nemotron 3 Ultra to SageMaker JumpStart, giving cloud developers one-click access to a 550-billion-parameter reasoning model designed specifically for long-running agentic workflows.
What's new
NVIDIA Nemotron 3 Ultra is now deployable on Amazon SageMaker JumpStart through a single-click experience, with no custom infrastructure required. The AWS announcement describes it as "an open model built for frontier reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agentic workloads."
Key specifications:
- 550 billion total parameters, 55 billion active per forward pass
- Hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture — combining state-space efficiency with attention-based reasoning
- Up to 1 million tokens of context length in NVFP4 precision
- Text-in, text-out modality
- 5x faster inference than dense alternatives for agentic workloads
- Up to 30% lower cost per agentic request
SageMaker JumpStart handles endpoint provisioning, monitoring, and scaling automatically. Developers access the model via the JumpStart catalog or the AWS console.
Context
Nemotron 3 Ultra is separate from the Nemotron 3.5 Content Safety model released in early June, which targets multimodal safety classification. Ultra is NVIDIA's general-purpose reasoning model for orchestration-heavy workloads.
The model's sparse MoE design is central to its efficiency story: only 55 billion of its 550 billion parameters activate per forward pass. At million-token context lengths — where dense models would consume prohibitive GPU memory — MoE sparsity keeps throughput viable at reasonable cost.
AWS SageMaker JumpStart has become a key distribution surface for frontier reasoning models. By co-launching there, NVIDIA removes the barrier of self-managed serving infrastructure for enterprise teams who want to evaluate the model without standing up dedicated GPU clusters.
Why it matters
The release reflects a broader trend: frontier model providers are increasingly co-launching with cloud hyperscalers rather than just releasing weights. For teams building agentic pipelines — autonomous coding agents, research orchestrators, multi-step process automation — having Nemotron 3 Ultra available as a managed endpoint means they can benchmark it in production conditions the same week it launches.
The 1M-token context window matters for agentic use cases specifically because agents accumulate long conversation histories: tool call inputs and outputs, sub-agent results, and planning traces all pile up. A model that can hold that full history in context without truncation makes more coherent long-horizon decisions.
For enterprises already on AWS who want a strong open alternative to closed-API reasoning models, Nemotron 3 Ultra on SageMaker JumpStart is now a production-ready option.
Corroborating sources
- Aws.amazon
https://aws.amazon.com/blogs/machine-learning/nvidia-nemotron-3-ultra-now-available-on-amazon-sagemaker-jumpstart/
“Nemotron 3 Ultra is an open model built for frontier reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agentic workloads.”