FeatureAnthropicVerified

Anthropic adds thinking_tokens breakdown to Messages API, exposing extended thinking costs per call

ListenMay 27, 2026published Jun 6, 2026

On May 27, 2026, Anthropic shipped a small but practically useful addition to the Claude API: responses now include a usage.output_tokens_details.thinking_tokens field that reports exactly how many billed output tokens came from extended thinking. The change requires no beta header and applies immediately to all models with extended thinking support.

What's new

The new field surfaces inside the existing usage.output_tokens_details object on every Messages API response:

"usage": {
  "output_tokens": 2450,
  "output_tokens_details": {
    "thinking_tokens": 1200
  }
}

When streaming, the breakdown appears only on the final message_delta event — not on intermediate events. Anthropic notes that no beta header is required: the field is available in standard API responses.

Context

Extended thinking on Claude Opus models generates reasoning blocks that are billed at the same per-token rate as visible output. On Opus 4.8 with the effort: "high" default, thinking token consumption can represent a large fraction of total output cost — in some agentic workloads, the majority.

Until this change, developers wanting to understand their thinking token usage had to parse the thinking blocks in the response body directly. That worked but added instrumentation overhead, and was not available at the billing-summary level. The new field makes per-call cost attribution automatic.

Why it matters

For teams running Claude at scale with extended thinking enabled — particularly in long-horizon agentic pipelines, Claude Code integrations, or advisor tool deployments — thinking token consumption is often the dominant line item. Accurate per-call breakdown makes it straightforward to:

Profile which prompts or agent steps are driving thinking-heavy responses
Tune the effort parameter or thinking.budget_tokens against observed cost rather than guesswork
Build cost dashboards without requiring content parsing

It is a small change, but it fills a gap that developers in extended-thinking workloads have been working around manually. With Opus 4.8 now the default for Claude Code and the effort parameter defaulting to high, thinking token visibility is more relevant than ever.

Corroborating sources

Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“The Messages API response now includes `usage.output_tokens_details.thinking_tokens`, reporting how many of the billed output tokens were extended thinking. When streaming, the breakdown appears only on the final `message_delta` event. No beta header is required.”

What's new

The new field surfaces inside the existing usage.output_tokens_details object on every Messages API response:

"usage": { "output_tokens": 2450, "output_tokens_details": { "thinking_tokens": 1200 } }

Context

Why it matters

Profile which prompts or agent steps are driving thinking-heavy responses

Tune the effort parameter or thinking.budget_tokens against observed cost rather than guesswork

Build cost dashboards without requiring content parsing