Anthropic adds max_tokens cap to advisor tool and stops billing for empty refusals on Claude API
Anthropic shipped two developer-facing ergonomics changes to the Claude API on June 2, 2026, both posted in the company's official Claude Platform release notes. The advisor tool — the higher-intelligence model that pairs with a faster executor model on long-horizon agentic workloads — now accepts a max_tokens parameter that caps the advisor's per-call output. And on the Messages API, requests that return stop_reason: "refusal" without any generated output are no longer billed.
What's new
On advisor-tool output capping, the release notes state: "The advisor tool now supports a max_tokens parameter to cap the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length advisor responses." Developers set tools[].max_tokens inside the advisor tool definition; the cap is per advisor invocation rather than per session.
On refusal billing, the release notes are equally explicit: "On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output." Refusals that produce partial output before stopping are still billed for the generated tokens; the change targets the specific case where the model declines a request and emits nothing at all. The refusal-handling page is the canonical reference for detecting and routing these stops in application code.
Both changes are GA and require no beta header.
Context
The advisor tool itself shipped in public beta on April 9, 2026 under the advisor-tool-2026-03-01 beta header, pitched as a way to keep long-horizon agentic workloads close to advisor-solo quality while paying for the bulk of token generation at executor-model rates. Adding max_tokens to the advisor was the most-requested ergonomic gap on that primitive: without a per-call cap, advisor responses could blow well past the strategic-guidance scope they were intended for, defeating the cost story.
The refusal-billing change closes a smaller but more visible friction. Anthropic's stop_details object on refusal responses became publicly documented on May 28, 2026 alongside Claude Opus 4.8, returning a category field of cyber, bio, or null and a human-readable explanation. Charging customers for a refused request with zero output had been a long-standing complaint on developer forums; flipping that to free aligns the billing model with the response's actual utility.
Why it matters
Neither change moves a benchmark, but both touch the unit economics of running Claude in production. max_tokens on the advisor tool gives teams running advisor-paired agentic loops a tight knob on the cost variable that was hardest to predict, which matters because advisor calls are priced at the higher-intelligence model's rate. And the refusal-billing change is the kind of trust signal that quietly shifts buying decisions on multi-provider agentic stacks — "we don't charge you for nothing" is a more durable argument than any single capability claim. Together they read as Anthropic continuing to tune the Claude API's pricing surface around the agentic workloads it expects to grow fastest through the rest of 2026.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“The advisor tool now supports a `max_tokens` parameter to cap the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length advisor responses.”