NVIDIA unveils RTX Spark superchip, bringing 1 petaflop of local AI performance to slim Windows PCs

ListenMay 31, 2026published Jun 6, 2026

NVIDIA unveiled RTX Spark on May 31, 2026, a purpose-built superchip that combines a Blackwell RTX GPU with a custom 20-core Grace CPU to deliver 1 petaflop of AI performance in slim laptops and compact desktops. Co-developed with Microsoft and MediaTek, RTX Spark marks the company's most direct push to move large-model inference from the data center to the desk.

What's new

RTX Spark's hardware specs are notable for a consumer chip:

GPU: NVIDIA Blackwell RTX with 6,144 CUDA cores and fifth-generation Tensor Cores with FP4 precision
CPU: 20-core NVIDIA Grace, built in partnership with MediaTek
Memory: Up to 128GB unified memory shared between CPU and GPU
AI performance: 1 petaflop, enough to run 120B-parameter LLMs with up to 1 million tokens of context entirely on-device

Beyond AI inference, the chip supports rendering 90GB+ 3D scenes, editing 12K 4:2:2 video, generating 4K AI video natively, and playing AAA games at 1440p above 100 fps.

System availability is scheduled for fall 2026 from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI, with Acer and GIGABYTE to follow. Adobe is rebuilding Photoshop and Premiere from the ground up for RTX Spark, claiming 2x faster AI and graphics performance.

Context

RTX Spark sits alongside NVIDIA's DGX Spark, a desktop server announced earlier in 2026 for researchers and developers running workloads at higher scale. Where DGX Spark targets professional AI infrastructure, RTX Spark extends the same on-device inference story to mainstream Windows PCs.

The announcement came at GTC Taipei, NVIDIA's first major event in Taiwan in several years, and was framed as a joint Microsoft-NVIDIA initiative — Satya Nadella appeared on stage and called RTX Spark "a real breakthrough" toward Microsoft's goal of delivering "unmetered intelligence to every home and every desk with Windows."

The chip uses NVLink to fuse the Blackwell GPU and Grace CPU, the same interconnect NVIDIA uses in its data-center Grace Blackwell systems, shrunk down for a 70-watt laptop envelope.

Why it matters

Running 120B-parameter models locally changes the economics of AI deployment in three concrete ways: no per-token API cost, no round-trip latency to the cloud, and no requirement to send data off the device. For enterprise workloads involving sensitive documents or regulated data, local inference is a compliance advantage, not just a performance one.

For developers, the 1M-token context window on a local device opens agentic workflows that would be prohibitively expensive at cloud pricing. For consumers, it means AI that works fully offline.

RTX Spark also signals how aggressively NVIDIA is moving to own the full stack — from the data-center training cluster down to the laptop. If the on-device inference story holds at the advertised performance levels when devices ship in the fall, it could reset expectations for what a Windows PC is expected to do.

Corroborating sources

Nvidianews.nvidia
https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark
“The PC is being reinvented. For forty years, you launched apps. Click. Type. With RTX Spark and Microsoft Windows, you ask — and the PC does the work.”

What's new

RTX Spark's hardware specs are notable for a consumer chip:

GPU: NVIDIA Blackwell RTX with 6,144 CUDA cores and fifth-generation Tensor Cores with FP4 precision

CPU: 20-core NVIDIA Grace, built in partnership with MediaTek

Memory: Up to 128GB unified memory shared between CPU and GPU

AI performance: 1 petaflop, enough to run 120B-parameter LLMs with up to 1 million tokens of context entirely on-device

Beyond AI inference, the chip supports rendering 90GB+ 3D scenes, editing 12K 4:2:2 video, generating 4K AI video natively, and playing AAA games at 1440p above 100 fps.

Context

The chip uses NVLink to fuse the Blackwell GPU and Grace CPU, the same interconnect NVIDIA uses in its data-center Grace Blackwell systems, shrunk down for a 70-watt laptop envelope.

Why it matters

For developers, the 1M-token context window on a local device opens agentic workflows that would be prohibitively expensive at cloud pricing. For consumers, it means AI that works fully offline.