NVIDIA Blackwell sweeps all seven MLPerf Training 6.0 benchmarks, the only platform with full-suite submissions
NVIDIA's Blackwell GPU platform claimed the top spot across all seven benchmarks in MLPerf Training 6.0, the latest edition of the peer-reviewed AI training benchmark suite. The results, published June 16, 2026, mark the first MLPerf round to include mixture-of-experts (MoE) pretraining workloads, with new benchmarks based on DeepSeek-V3 671B and GPT-OSS-20B.
What's new
NVIDIA submitted results across every benchmark in the suite — the only platform to do so — and posted the fastest training times in each category. Key results:
- GB300 NVL72 delivered up to 1.6x faster training than the previous-generation GB200 NVL72 at identical scale
- Largest-scale submission: 8,192 Blackwell GPUs on the DeepSeek-V3 671B benchmark
- At that scale, CoreWeave trained DeepSeek-V3 671B to quality target in 2.02 minutes
- Microsoft Azure trained Llama 3.1 405B to quality target in 7.07 minutes at 8,192-GPU scale
- A separate 5,120-GPU submission covered the Llama 3.1 405B benchmark
MLPerf Training 6.0 introduced two new MoE workloads for the first time:
- DeepSeek-V3 671B — 671 billion parameter mixture-of-experts architecture
- GPT-OSS-20B — 20 billion parameter open-source model
NVIDIA's reliability infrastructure also featured in the results: GPUs undergo 30+ manufacturing test stages before reaching data centers, and the platform includes self-healing capabilities plus the NVIDIA Resiliency Extension (NVRx) for resume-from-checkpoint recovery.
Context
MLPerf Training is the primary independent benchmark for measuring how fast AI hardware trains production-grade models. The suite is maintained by MLCommons, an industry consortium, and results are peer-reviewed and reproducible — carrying substantially more weight than vendor-cited internal benchmarks.
Blackwell is NVIDIA's current-generation GPU architecture, with the GB300 NVL72 representing the latest iteration. Its 1.6x speed improvement over the prior-generation GB200 NVL72 within a single MLPerf cycle signals meaningful generational performance gains.
Why it matters
The inclusion of DeepSeek-V3 671B as a benchmark workload is significant on its own: it validates MoE architectures as a mainstream training target and forces hardware vendors to optimize for the sparse-compute patterns these models demand. NVIDIA's clean sweep positions Blackwell as the dominant platform for training the kinds of MoE models now central to frontier AI development.
For cloud providers and AI labs evaluating infrastructure, results like the 2.02-minute DeepSeek-V3 training run at scale translate directly into cost-per-trained-model comparisons. The competitive gap — no other platform submitted across all seven benchmarks — reinforces NVIDIA's position in the AI training market heading into the second half of 2026.
Corroborating sources
- Blogs.nvidia
https://blogs.nvidia.com/blog/blackwell-mlperf-training-6-0/
“NVIDIA brings together performance, scale and reliability in a single platform engineered through extreme codesign to enable AI model builders to launch frontier models faster, minimize training costs and start generating revenue early.”