Anthropic API: Refusals Without Output Now Free, Advisor Tool Gains Output Cap
Anthropic shipped two API changes on June 2, 2026 that reduce billing costs and improve control for developers running inference at scale.
What's New
No billing for zero-output refusals: On the Claude API, requests that return stop_reason: "refusal" with no model-generated output are no longer billed. Previously, a request that triggered a safety refusal before Claude produced any tokens still incurred an API call cost. Zero-output refusals are now free.
This change works alongside the stop_details field—publicly documented on May 28 alongside Opus 4.8—which returns a category (cyber, bio, or null) and a human-readable explanation, enabling application code to route different refusal classes to the right next step.
Advisor tool max_tokens cap: The advisor tool, which launched in public beta on April 9, 2026, now supports a max_tokens parameter on the tool definition. Setting tools[].max_tokens caps the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length advisor responses.
The advisor tool pairs a fast executor model with a higher-intelligence advisor model for strategic guidance during long-horizon agentic tasks. Many workflows use the advisor for short directional nudges rather than full responses—previously there was no way to constrain that output length without restructuring the request.
Context
The advisor tool launched April 9, 2026 under the advisor-tool-2026-03-01 beta header. The max_tokens addition is the first notable capability extension since launch. Both the no-billing change and the max_tokens parameter are available now with no beta header required.
The refusal billing change complements Anthropic's broader investment in refusal transparency. The stop_details.category field—distinguishing cyber and bio refusals—gives application developers structured data to log, route, or escalate specific refusal types, rather than treating all refusals identically.
Why It Matters
Neither change is a major capability announcement, but both reduce costs in specific use cases. For high-volume production deployments where refusals occur as part of normal guard-rail behavior—content moderation pipelines, safety testing, red-teaming—free zero-output refusals reduce waste with no code changes required.
For teams using the advisor tool in cost-sensitive agentic loops, max_tokens provides the first lever to trade off advisor depth versus cost per call, potentially making the tool practical for higher-frequency use cases where full-length advisor responses were too expensive.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output.”