FundamentalsUpdated May 30, 2026

Context windows, explained

What a context window actually is

A context window is the maximum amount of text, measured in tokens, that a model can consider at once. It is the model's working memory for a single request. Everything has to fit inside it: your system instructions, the conversation history, any documents or code you paste in, the tool definitions, and the answer the model is about to write. When people say a model has a one million token context window, they mean the input and the output together have to fit under that ceiling.

A token is a chunk of text, usually a few characters. As a rough planning rule, a token is about three quarters of a word in English, so one million tokens is very roughly seven hundred thousand words. That is large enough to hold a substantial codebase or a stack of long documents in a single call.

How big are today's windows

Across the models ModelDex tracks, the large models cluster around one million tokens. Claude Opus 4.8, Claude Opus 4.7, and Claude Sonnet 4.6 each carry a one million token window. The Gemini 3 family is precise about it: Gemini 3 Flash, Gemini 3.1 Pro, and Gemini 3.5 Flash each list 1,048,576 tokens, which is exactly two to the twentieth power. The GPT-5.5, GPT-5.5 Pro, and GPT-5.4 models list 1,050,000 tokens. Grok 4.3 and Grok 4.20 list one million.

The smaller and cheaper models give you less, by design. Claude Haiku 4.5 carries 200,000 tokens. GPT-5.4 mini and GPT-5.4 nano carry 400,000. Grok Build 0.1 carries 256,000. These are still large windows in absolute terms, but if your task routinely sends very long inputs, the difference matters.

Input window versus output limit

Here is a distinction that trips people up. The context window is not the same as the maximum output. A model can read far more than it is allowed to write in a single response. Claude Opus 4.8 reads up to one million tokens but caps a single response at 128,000 output tokens. The Gemini 3 family reads about one million but caps output at 65,536 tokens. The GPT-5 family models we track cap output at 128,000. So a one million token window does not mean a one million token answer. It means a very large prompt and a still substantial, but smaller, reply.

Bigger is not automatically better

A large window is capacity, not a strategy. Three things are worth knowing before you fill it.

Cost scales with what you actually send. You are billed per token of input, so a prompt that fills a one million token window costs roughly a hundred times more than a ten thousand token prompt on the same model. A big window does not cost more when unused. It costs more when you use it.

Some providers price long prompts differently. In our dataset, Gemini 2.5 Pro and Gemini 3.1 Pro list their headline input prices specifically for prompts at or under 200,000 tokens, which signals that very long prompts can be billed under a different rate tier. Always check the provider pricing for the long prompt case before you design a workload around it.

Retrieval often beats stuffing. Just because you can paste an entire knowledge base into the prompt does not mean you should. Sending only the relevant passages, selected by search or retrieval, is usually cheaper, faster, and produces sharper answers than filling the window and hoping the model finds the needle.

How to size your window

Estimate the largest single input your task will ever send: the longest document, the biggest code file set, the longest conversation you will keep in history. Add room for the system prompt, the tool definitions, and the answer. If that total sits comfortably under a model's window, the window is not your constraint and you can choose on price and capability instead. If it does not, you have two options: move to a larger window model, or restructure the task so each call carries less.

Where the numbers come from

Every context window and output limit cited here is a verified figure on the corresponding ModelDex model page, traced to the provider's own documentation. Providers do occasionally raise or adjust these limits, and our dataset tracks their docs, so the live model page is always the figure to trust.