Anthropic removes billing for refusals and adds advisor tool output cap
Anthropic shipped two developer-facing improvements to the Claude API on June 2, 2026: a billing policy change that eliminates charges for zero-output refusals, and a new max_tokens parameter on the advisor tool for capping per-call output.
What's new
Refusal billing change:
- Requests that return
stop_reason: "refusal"with zero generated output are now free - The
stop_detailsfield on Opus 4.8 (no beta header required) returns acategoryfield (cyber,bio, ornull) and a human-readableexplanationto help route different refusal classes
Advisor tool update:
- New
max_tokensparameter on the advisor tool definition (tools[].max_tokens) - Caps the advisor model's output per call, reducing latency and output token cost
- Configures per-tool: set in the tool definition
- Useful for workloads where the advisor's guidance does not require full-length responses
Context
The advisor tool launched April 9, 2026, as a way to pair a faster executor model with a higher-intelligence advisor that provides strategic guidance mid-generation. Without output caps, the advisor generates full-length responses even when short guidance would suffice, burning tokens and latency. The max_tokens parameter addresses this.
The refusal billing change corrects a developer pain point: under the prior policy, a request that triggered a content policy refusal before generating any output was billed at standard rates. At scale — particularly for applications processing diverse user input — refusal charges accumulate meaningfully.
Why it matters
The zero-output refusal change has a direct cost impact for applications that encounter high refusal rates as a normal part of operation. Security tooling, content moderation pipelines, or broad consumer-facing apps may see non-trivial reductions in API costs. The stop_details category field adds programmatic routing: applications can now distinguish a cyber refusal from a bio refusal and respond differently.
The advisor tool max_tokens parameter is a targeted efficiency improvement. For agentic coding workloads where the advisor's role is to flag errors or suggest approach corrections, a short advisor response is often more useful than a long one — and cheaper to generate. Configuring the cap at the tool-definition level keeps the optimization consistent across all calls that use that tool definition.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output.”