FeatureOpenAIVerified

OpenAI extends default prompt cache retention to 24 hours for most API customers

ListenMay 29, 2026published Jun 6, 2026

OpenAI updated its default prompt caching behavior on May 29, 2026, changing the default prompt_cache_retention setting from in_memory to 24h for all organizations that do not have Zero Data Retention enabled. Cached prompt prefixes now persist across API calls within a 24-hour window by default, without any configuration change required from developers.

What's new

The change affects the default value of prompt_cache_retention, a parameter that controls how long a cached prompt prefix remains available for re-use:

Before: in_memory — cache lived only within the current session or compute context; a new API call from a different instance or after a gap would miss the cache
After: 24h — cache persists across calls for up to 24 hours, regardless of where the request originates

The change is automatic for eligible organizations. Developers who want to opt back into shorter retention can set prompt_cache_retention: in_memory explicitly. Organizations with Zero Data Retention enabled are unaffected — their setting remains in_memory to meet data handling requirements.

OpenAI charges a reduced token rate for cache hits versus full token ingestion. The economics of caching only apply when the cache is actually hit, which previously required calls to land on the same server or within the same session window.

Context

OpenAI introduced prompt caching in 2024 as a way to reduce costs and latency for API calls that repeat large prompt prefixes — system instructions, document context, tool definitions, or few-shot examples. A cached prefix avoids re-processing the same tokens on each request, reducing both input token charges and time-to-first-token.

The in_memory default was conservative: safe for ZDR customers, simple to reason about, but it meant that production workloads distributed across multiple API endpoints or making calls with gaps between them frequently missed the cache and paid full input token costs.

The shift to 24h default retention reflects OpenAI's confidence in its cache infrastructure and signals that most production use patterns benefit from longer persistence.

Why it matters

For developers running production AI applications — agents, analysis pipelines, RAG systems, coding tools — the practical effect is a meaningful reduction in API costs for workloads that reuse the same large prefix across many calls. A system prompt or document set loaded once in the morning will continue generating cache hits through the workday without any manual cache management.

The change requires no code changes for most users. Teams who were already manually pinning or refreshing cache keys to maximize hits can simplify their implementations; teams who had given up on caching benefits due to the session-scoped limitation should see immediate improvement.

The one class of customer to review their settings is organizations with strict data residency or retention policies who are not already on ZDR — the 24h default means prompt prefix data is retained server-side for a day, which some compliance requirements may not permit.

Corroborating sources

Developers.openai
https://developers.openai.com/api/docs/changelog
“For organizations without ZDR enabled, prompt_cache_retention now defaults to 24h instead of in_memory, enabling extended prompt caching by default.”

What's new

The change affects the default value of prompt_cache_retention, a parameter that controls how long a cached prompt prefix remains available for re-use:

Before: in_memory — cache lived only within the current session or compute context; a new API call from a different instance or after a gap would miss the cache

After: 24h — cache persists across calls for up to 24 hours, regardless of where the request originates

Context

The shift to 24h default retention reflects OpenAI's confidence in its cache infrastructure and signals that most production use patterns benefit from longer persistence.

Why it matters