Skip to content
All guides
Working with models

Reasoning effort: when it matters

What reasoning effort means

Modern frontier models can think before they answer. Instead of producing the response token by token in one pass, a reasoning model first works through the problem internally, then writes its final answer. Reasoning effort is the dial that controls how much of that internal thinking the model does. Turn it up and the model spends more internal tokens deliberating, which tends to help on hard problems. Turn it down, or off, and the model answers faster and cheaper.

This is one of the most useful and most misused controls in the current generation of models. Used well, it gets you flagship quality on the hard calls and fast, cheap answers on the easy ones. Used carelessly, it quietly inflates both latency and cost on tasks that never needed it.

How the providers expose it

The control looks a little different on each provider, but the idea is the same.

OpenAI exposes an explicit effort setting on its reasoning models. GPT-5.4 and GPT-5.5 in our dataset support effort levels of none, low, medium, high, and xhigh. That gives you a clean five step dial from a fast non reasoning response up to maximum deliberation. GPT-5.5 Pro is positioned as the highest reasoning tier in the family.

Anthropic frames it as thinking. Claude Opus 4.8 supports adaptive thinking, where the model adjusts how much it thinks to the difficulty of the request. Claude Sonnet 4.6 supports both extended and adaptive thinking. Claude Haiku 4.5 supports extended thinking but not the adaptive variant.

Google ships a thinking mode on the Gemini 3 family. Gemini 3 Flash, Gemini 3.1 Pro, and Gemini 3.5 Flash all support thinking mode.

xAI lists reasoning support on Grok 4.3 and Grok 4.20.

The exact parameter names and allowed values live in each provider's documentation, and each ModelDex model page records whether that model supports reasoning at all.

When higher effort earns its cost

Turn effort up when the failure mode is being wrong, not being slow. The clearest wins are multi step reasoning, hard mathematics, complex code generation and debugging, careful planning across many tool calls, and any task where a subtle error propagates into an expensive mistake downstream. On these, the extra internal thinking often changes a wrong answer into a right one, which is worth real money and real latency.

When higher effort is wasted

Turn effort down, or off, when the task is simple or well bounded. Classification, extraction, formatting, short rewrites, routing, and straightforward question answering rarely benefit from deep deliberation. On these tasks, high effort buys you slower responses and a larger bill for an answer that was already correct at low effort. The none and low settings exist precisely so you stop paying for thinking you do not need.

The hidden cost: effort is billed as output

Here is the part teams miss. The internal thinking a reasoning model does is generated text, and it is billed at the output rate. So raising reasoning effort raises your output token count, and output is the expensive side of the bill on every provider. The per token price does not change, but the number of output tokens does, sometimes substantially. A high effort answer can cost several times a low effort answer on the same model for the same question.

This is why effort and model tier interact. Running maximum effort on an already expensive model is the costliest possible combination. Running maximum effort on GPT-5.5 Pro, at $180 per million output tokens, is the most expensive single configuration in our entire dataset. Reserve it for the rare problem that genuinely needs it.

A practical effort policy

Default to a low or moderate setting for everyday work. Reserve high effort for the specific calls where correctness is hard and expensive to get wrong, and confirm with your own evaluations that the higher setting actually improves the result, rather than assuming it does. Pair the effort dial with the model tier: easy task on a cheap model at low effort, hard task on a strong model at high effort. Most of the value comes from not running high effort everywhere by default.

Where the facts come from

The reasoning support recorded for each model in this guide is a verified figure on its ModelDex page, traced to the provider's documentation. The exact effort parameters and their behavior are defined by each provider and can change, so the provider docs and the live model page are the current source of truth.