FeatureAnthropicVerified

Anthropic ends billing for zero-output refusals and adds advisor tool token caps in June API update

ListenJun 2, 2026published Jun 6, 2026

Anthropic shipped two developer-facing Claude API changes on June 2, 2026: a max_tokens parameter for the advisor tool that caps advisor model output per call, and a billing policy change that eliminates charges for requests where Claude returns stop_reason: "refusal" without generating any visible output.

What's new

Advisor tool max_tokens cap

The advisor tool — which lets a primary model delegate a sub-question to a lighter secondary model — now accepts a max_tokens field on the tool definition. Developers set tools[].max_tokens to limit the advisor's maximum generated length per call. The targeted use case is classification and routing workloads where the advisor's answer is a short judgment, not a full response. By capping output, teams can reduce both latency and output-token cost on those calls. The parameter is documented in the Capping advisor output section of the advisor tool reference.

No billing for zero-output refusals

When a Claude API request triggers a hard refusal — returning stop_reason: "refusal" with no generated content — the request is no longer billed. Previously, such requests were charged even though they produced no output. The change is documented in the updated Streaming refusals guidance.

This pairs with the stop_details field that shipped with Claude Opus 4.8 on May 28: that field returns a category (cyber, bio, or null) and a human-readable explanation on each refusal response, enabling application-level routing of different refusal classes without a beta header.

Context

The advisor tool pattern was introduced as part of Anthropic's multi-agent building blocks. Rather than spinning up a full agent loop, the advisor pattern routes a sub-question — fact-check, format judgment, ambiguity resolution — to a lighter model inline. Without an output cap, that model could generate thousands of tokens even when the calling application only needs a one-word classification. The max_tokens parameter closes that loop.

The billing-on-refusal behavior has been a friction point for developers building safety-sensitive applications. Customer support agents, general-purpose assistants, and content moderation pipelines all encounter non-trivial refusal rates. Under the old model, refusals that produced no output were still line items on the bill. The change removes that unpredictability.

Both features were surfaced in the Claude API changelog update dated June 2, 2026.

Why it matters

Neither change is a capability release — they are production-economics improvements. But they reflect a pattern in how Anthropic has been iterating on the API: reducing the friction and cost unpredictability that accumulates at scale.

For teams running multi-agent pipelines with advisor-tool routing, the max_tokens cap offers a new cost-control lever. A 500-token reduction in average advisor output per call, multiplied across millions of requests, produces meaningful savings. The cap is opt-in and does not change default behavior for teams not using it.

For teams handling broad input sets, the refusal billing change is effectively a cost guarantee: you pay for Claude's output, not for what Claude declines to output. Combined with the stop_details category field, applications can now route cyber-category refusals differently from bio-category ones, with structured data rather than parsing the refusal message, and none of those routing passes add to the bill.

Corroborating sources

Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output.”