FeatureAnthropicVerified

Anthropic removes billing for refusals and adds advisor tool output cap

ListenJun 2, 2026published Jun 6, 2026

Anthropic shipped two developer-facing improvements to the Claude API on June 2, 2026: a billing policy change that eliminates charges for zero-output refusals, and a new max_tokens parameter on the advisor tool for capping per-call output.

What's new

Refusal billing change:

Requests that return stop_reason: "refusal" with zero generated output are now free
The stop_details field on Opus 4.8 (no beta header required) returns a category field (cyber, bio, or null) and a human-readable explanation to help route different refusal classes

Advisor tool update:

New max_tokens parameter on the advisor tool definition (tools[].max_tokens)
Caps the advisor model's output per call, reducing latency and output token cost
Configures per-tool: set in the tool definition
Useful for workloads where the advisor's guidance does not require full-length responses

Context

The advisor tool launched April 9, 2026, as a way to pair a faster executor model with a higher-intelligence advisor that provides strategic guidance mid-generation. Without output caps, the advisor generates full-length responses even when short guidance would suffice, burning tokens and latency. The max_tokens parameter addresses this.

The refusal billing change corrects a developer pain point: under the prior policy, a request that triggered a content policy refusal before generating any output was billed at standard rates. At scale — particularly for applications processing diverse user input — refusal charges accumulate meaningfully.

Why it matters

The zero-output refusal change has a direct cost impact for applications that encounter high refusal rates as a normal part of operation. Security tooling, content moderation pipelines, or broad consumer-facing apps may see non-trivial reductions in API costs. The stop_details category field adds programmatic routing: applications can now distinguish a cyber refusal from a bio refusal and respond differently.

The advisor tool max_tokens parameter is a targeted efficiency improvement. For agentic coding workloads where the advisor's role is to flag errors or suggest approach corrections, a short advisor response is often more useful than a long one — and cheaper to generate. Configuring the cap at the tool-definition level keeps the optimization consistent across all calls that use that tool definition.

Corroborating sources

Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output.”

What's new

Refusal billing change:

Requests that return stop_reason: "refusal" with zero generated output are now free

The stop_details field on Opus 4.8 (no beta header required) returns a category field (cyber, bio, or null) and a human-readable explanation to help route different refusal classes

Advisor tool update:

New max_tokens parameter on the advisor tool definition (tools[].max_tokens)

Caps the advisor model's output per call, reducing latency and output token cost

Configures per-tool: set in the tool definition

Useful for workloads where the advisor's guidance does not require full-length responses

Context

Why it matters