NVIDIA Blackwell leads Artificial Analysis AgentPerf — the first agentic AI infrastructure benchmark

No audio yetJun 12, 2026published Jun 12, 2026

NVIDIA's Blackwell architecture has taken the top spot in AgentPerf, the first benchmark designed specifically to measure agentic AI infrastructure performance, according to results published June 12 by Artificial Analysis and NVIDIA. The NVIDIA GB300 NVL72 system delivers up to 20 times more agents per megawatt than the previous-generation HGX H200, a result that could influence enterprise infrastructure decisions as agentic AI deployments scale.

What's new

AgentPerf is developed by Artificial Analysis — the same independent research group known for model quality and pricing benchmarks — and targets a workload category that existing benchmarks miss: multi-step AI agents. The benchmark is grounded in real coding agent trajectories, modeling scenarios where an agent receives a task, reads files, writes and edits code, executes commands, and iterates based on the results.

Key results for NVIDIA Blackwell:

GB300 NVL72: delivers up to 20x more agents per megawatt than the NVIDIA HGX H200 system when running DeepSeek V4 Pro.
The efficiency gain comes from connecting 72 GPUs into a single rack-scale unit, with CUDA kernels that overlap communication and compute so the cost of coordinating across experts is absorbed rather than added to latency.

AgentPerf measures concurrency and throughput together: it counts how many agentic tasks a platform can support simultaneously while meeting defined performance thresholds for responsiveness and output token rate.

Context

Until now, inference benchmarks focused on single-request throughput or batch processing — metrics suited to chatbot or API workloads. Agentic AI changes the infrastructure equation: an agent makes dozens to hundreds of LLM calls per task, uses tools, and runs for minutes rather than milliseconds. Infrastructure that handles 100 concurrent chatbot users may saturate under far fewer coding agents.

NVIDIA has increasingly positioned Blackwell around agentic use cases. The GB300 NVL72 — a rack-scale system connecting 72 next-generation GPUs — began shipping to cloud providers and enterprises earlier in 2026. DeepSeek V4 Pro, a large mixture-of-experts model, was used as the test model for the headline efficiency result.

Why it matters

For enterprises planning AI agent deployments at scale, infrastructure efficiency per watt maps directly to cost per useful task completed. A 20x efficiency improvement at rack scale represents a substantial shift in the cost model for agentic workloads. As NVIDIA put it: "For enterprises deploying AI agents at scale, those numbers determine how much productive work a given infrastructure investment can actually deliver."

The establishment of AgentPerf as an industry benchmark also matters beyond the headline result: it creates a common vocabulary for comparing infrastructure options — something the agentic AI market has lacked. Competitors including AMD, Cerebras, and Groq will likely respond with their own AgentPerf results as the benchmark gains traction.

Corroborating sources

Blogs.nvidia
https://blogs.nvidia.com/blog/nvidia-blackwell-agentperf-artificial-analysis/
“NVIDIA GB300 NVL72 delivers the highest performance in the benchmark, running up to 20x more agents per megawatt than the NVIDIA HGX H200 system.”

What's new

Key results for NVIDIA Blackwell:

GB300 NVL72: delivers up to 20x more agents per megawatt than the NVIDIA HGX H200 system when running DeepSeek V4 Pro.

The efficiency gain comes from connecting 72 GPUs into a single rack-scale unit, with CUDA kernels that overlap communication and compute so the cost of coordinating across experts is absorbed rather than added to latency.

Context

Why it matters