Anthropic Adds Advisor Tool Cost Controls and Drops Billing for Refused Requests
Anthropic on June 2, 2026, shipped two developer-facing API updates: a new cost-control parameter for the advisor tool, and a billing change that eliminates charges for requests the model refuses before generating any output.
What's new
Advisor tool max_tokens parameter
The advisor tool — Anthropic's built-in mechanism for routing a query to a second model for evaluation or augmentation within a single API call — now accepts a max_tokens parameter on the tool definition. Developers set tools[].max_tokens to cap how many output tokens the advisor model can produce per call.
This parameter targets two costs simultaneously: latency and billing. Fewer output tokens from the advisor means faster end-to-end response times. Since output tokens are billed, capping advisor output length reduces cost per call for workloads where a brief advisor check is sufficient — for example, validating for policy violations or checking for factual errors — rather than requiring full-length advisor synthesis.
The parameter is documented under "Capping advisor output" in the advisor tool reference.
No billing for refused requests with no output
When the Claude API returns stop_reason: "refusal" and Claude has generated no output — meaning the request was blocked before any content was produced — that request is no longer billed. Previously, input tokens were charged regardless of whether the model generated a response.
This change applies only to zero-output refusals. Requests where Claude generates partial output before refusing are not affected by this billing update.
Developers can detect and handle refusals using the streaming refusals documentation. The stop_details field, publicly documented since the Opus 4.8 launch on May 28, provides a category and explanation alongside refusal responses to help applications route different refusal types to the appropriate next step.
Context
The advisor tool was introduced to give developers a native second-pass evaluation layer without building a separate API call pipeline. As advisor tool adoption grew in production systems, uncapped response lengths emerged as a latency and cost concern — particularly for high-volume workloads using the advisor for lightweight validation rather than full-length output.
The billing change for zero-output refusals had been a persistent friction point for developers. Applications that serve diverse user inputs sometimes encounter refusals at meaningful rates; charging for those requests added cost without delivering any value.
Why it matters
Both updates target the operational economics of production Claude API deployments. Teams running millions of advisor-enhanced calls per day will see measurable cost reductions from capping advisor output length, especially if they were previously receiving full-length advisor responses for simple validation tasks.
The refusal billing change matters most for safety-filtered applications and systems that operate near the edges of Claude's content policies as part of normal operation. Eliminating that cost line removes a penalty that had no equivalent in how users interact with Claude directly — a meaningful step toward making the API billing model match actual value delivered.
Together, the two updates reflect Anthropic's ongoing effort to make Claude API costs more predictable and proportional as teams scale from prototype to production workloads.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output.”