H-company releases Holo3.1, a local-first computer-use agent family with up to 35B parameters

ListenJun 2, 2026published Jun 3, 2026

H-company released Holo3.1 on June 2, 2026, a family of computer-use agents in four sizes — 0.8B, 4B, 9B, and 35B-A3B (a 35-billion-parameter Mixture-of-Experts with roughly 3 billion active per token) — with the explicit goal of running locally on hardware as small as Apple Silicon or as performance-tuned as an NVIDIA DGX Spark. The release is the first in the Holo line to ship quantized checkpoints — FP8, Q4 GGUF, and NVFP4 — optimized for local inference, and the company reports a 25%+ improvement over the prior generation in its own product harness.

What's new

H-company frames the release on the Hugging Face blog as "Holo3.1: Fast & Local Computer Use Agents." The post argues the upgrade focuses on production-grade robustness: "Holo3.1 improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets."

The headline numbers:

Model sizes. "The Holo3.1 family is available in four sizes: Holo3.1-0.8B, Holo3.1-4B, Holo3.1-9B, Holo3.1-35B-A3B."
AndroidWorld benchmark. "On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%."
In-house harness. "Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness."
Local-first quantization. "For the first time, we release quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4."
Throughput on DGX Spark. "on DGX Spark, NVFP4 W4A16 delivers 1.41× the total token throughput of FP8 and 1.74× that of BF16."
Step latency. "cutting average step time from 6.8s to 3.3s."

On privacy: "execution stays fully private and local, with nothing leaving the user's network."

Context

Computer-use agents — models that take screenshots, plan actions, and emit clicks or keystrokes against a real OS — have been a pulled-forward frontier for the major labs through 2025 and 2026. Anthropic shipped computer use in the Claude tool family; OpenAI surfaced operator; both have pushed the bar on cloud-side benchmark scores. The friction has consistently been the same shape: latency to a hosted model, recurring per-action token cost, and the privacy ceiling that comes with sending screen content to a third-party API.

H-company's pitch with Holo3.1 attacks that friction directly. The 0.8B and 4B sizes can run on consumer-grade hardware, the 35B-A3B sparse design fits a single workstation-class accelerator, and the quantization recipes are tuned for local throughput rather than cloud cost.

Why it matters

The benchmark jump — AndroidWorld 67% → 79.3% on the flagship size — narrows the gap that local computer-use agents have carried versus hosted frontier offerings. The 6.8s → 3.3s step-time cut is the more practically meaningful number: latency is what makes computer-use agents unusable at desktop pace, and roughly halving it changes the texture of how an agent feels to drive.

Holo3.1 is a niche release in the sense that computer-use is still an emerging surface, but it is a real one. The fact that H-company is willing to publish concrete benchmark numbers, quantized checkpoints, and explicit local-deployment recipes is the substance worth tracking — the open-source half of the computer-use space has been thin on disciplined, comparable benchmarks, and this release adds one.

Corroborating sources

Huggingface.co
https://huggingface.co/blog/Hcompany/holo31
“On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%.”

What's new

The headline numbers:

Model sizes. "The Holo3.1 family is available in four sizes: Holo3.1-0.8B, Holo3.1-4B, Holo3.1-9B, Holo3.1-35B-A3B."

AndroidWorld benchmark. "On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%."

In-house harness. "Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness."

Local-first quantization. "For the first time, we release quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4."

Throughput on DGX Spark. "on DGX Spark, NVFP4 W4A16 delivers 1.41× the total token throughput of FP8 and 1.74× that of BF16."

Step latency. "cutting average step time from 6.8s to 3.3s."

On privacy: "execution stays fully private and local, with nothing leaving the user's network."

Context

Why it matters