Anthropic adds thinking_tokens breakdown to Messages API, exposing extended thinking costs per call
On May 27, 2026, Anthropic shipped a small but practically useful addition to the Claude API: responses now include a usage.output_tokens_details.thinking_tokens field that reports exactly how many billed output tokens came from extended thinking. The change requires no beta header and applies immediately to all models with extended thinking support.
What's new
The new field surfaces inside the existing usage.output_tokens_details object on every Messages API response:
"usage": {
"output_tokens": 2450,
"output_tokens_details": {
"thinking_tokens": 1200
}
}
When streaming, the breakdown appears only on the final message_delta event — not on intermediate events. Anthropic notes that no beta header is required: the field is available in standard API responses.
Context
Extended thinking on Claude Opus models generates reasoning blocks that are billed at the same per-token rate as visible output. On Opus 4.8 with the effort: "high" default, thinking token consumption can represent a large fraction of total output cost — in some agentic workloads, the majority.
Until this change, developers wanting to understand their thinking token usage had to parse the thinking blocks in the response body directly. That worked but added instrumentation overhead, and was not available at the billing-summary level. The new field makes per-call cost attribution automatic.
Why it matters
For teams running Claude at scale with extended thinking enabled — particularly in long-horizon agentic pipelines, Claude Code integrations, or advisor tool deployments — thinking token consumption is often the dominant line item. Accurate per-call breakdown makes it straightforward to:
- Profile which prompts or agent steps are driving thinking-heavy responses
- Tune the
effortparameter orthinking.budget_tokensagainst observed cost rather than guesswork - Build cost dashboards without requiring content parsing
It is a small change, but it fills a gap that developers in extended-thinking workloads have been working around manually. With Opus 4.8 now the default for Claude Code and the effort parameter defaulting to high, thinking token visibility is more relevant than ever.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“The Messages API response now includes `usage.output_tokens_details.thinking_tokens`, reporting how many of the billed output tokens were extended thinking. When streaming, the breakdown appears only on the final `message_delta` event. No beta header is required.”