Anthropic adds reasoning_extraction refusal category to Claude Fable 5, blocking API attempts to reverse-engineer model outputs
Anthropic has added a new refusal category to Claude Fable 5 that specifically blocks API requests aimed at reverse-engineering or duplicating the model's outputs. The change, documented in the Claude Platform API changelog on June 9, 2026, introduces reasoning_extraction as a recognized value in the stop_details.category field — the first time Anthropic has implemented a dedicated technical control against model output extraction at the API layer.
What's new
When a Fable 5 request is blocked by this mechanism, the Messages API returns stop_reason: "refusal" and stop_details.category: "reasoning_extraction". The changelog describes this category as covering requests blocked under Anthropic's Terms of Service restrictions on "reverse engineering or duplicating model outputs."
Key implementation details:
- No beta header required — the behavior is active by default on Claude Fable 5
- The existing
cyberandbiorefusal categories remain unchanged; those route certain requests to Claude Opus 4.8 rather than blocking outright - A separate opt-in
fallbacksparameter, currently in beta, lets API users re-run a refused Fable 5 request on another model, billed at that model's rates - Requests refused before any output is generated are not billed
The reasoning_extraction block appears to be a hard refusal with no built-in routing fallback, unlike the cyber and bio categories.
Context
Claude Fable 5 launched June 9, 2026 as Anthropic's most capable publicly released model. It runs safety classifiers continuously during response generation — a change from earlier Claude models, where classification happened only at the request gate. That architecture makes it technically possible to intercept and block requests mid-generation.
The reasoning_extraction category appears targeted at a specific class of research practice: probing model outputs systematically to infer internal reasoning, reconstruct training data distributions, or build derivative models. Academic and commercial research labs have used such techniques to study closed models. Anthropic's Terms of Service have long prohibited reverse engineering; this change makes that prohibition technically enforced at the API level rather than relying solely on contractual terms.
Why it matters
Adding a structured reasoning_extraction refusal category does two things simultaneously. Practically, it gives API developers a machine-readable signal to catch and route — instead of a generic refusal, an application can identify that a request was specifically blocked for extraction-like behavior and decide whether to rephrase, retry, or escalate.
For the research community, the broader question is how the classifier draws its boundaries. Standard interpretability research, fine-tuning probes, and adversarial red-teaming workflows could potentially overlap with behaviors the classifier identifies as extraction attempts. Anthropic has not published the specific classifier logic, so the effective scope of the block will likely become clearer through developer reports over the coming weeks.
It also sets a precedent: Fable 5 is the first widely released Claude model where model IP enforcement operates as real-time API infrastructure rather than post-hoc policy enforcement.
Corroborating sources
- Platform.claude
https://platform.claude.com/docs/en/release-notes/api
“The stop_details.category field on refusal responses now includes "reasoning_extraction" on Claude Fable 5, returned when a request is blocked under Anthropic's Terms of Service restrictions on reverse engineering or duplicating model outputs.”