Google · Gemini 2.5

Gemini 2.5 Flash-Lite

CurrentStale · Jul 26, 2026

The cheapest and fastest model in the Gemini 2.5 family: a lightweight, low-latency variant optimized for high-volume, cost-sensitive workloads. Keeps the 1M-token context window and multimodal input but trades peak accuracy for throughput and price.

profile normalized against the 88-model field

Context window· 1.05M of 10M10%

Max output· 66K of 384K17%

Output speed· 320 tok/s100%

Affordability· $0.40 / Mtok out100%

Capability breadth· 9 of 1182%

Capability switches · 9 of 11

Reasoning mode

Tool / function use

Streaming

JSON mode

Structured outputs

Prompt caching

Fine-tuning

Web search

Code execution

Vision input

Audio input

Specifications

Every value carries a primary source and a verification date.

Capacity

Context window

1.05M

Max output

66K

Pricing

Input $/Mtok

$0.10 / Mtok input USD/Mtok

Cached input $/Mtok

$0.01 / Mtok (text/image/video) USD

Output $/Mtok

$0.40 / Mtok output USD/Mtok

Performance

Output speed

320 tok/s tok/s

Time to first token

210 ms ms

Capabilities

Input modalities

["audio","image","pdf","text","video"]

Output modalities

Text

Reasoning mode

Yes

Tool / function use

true

Streaming

Yes

JSON mode

Yes

Structured outputs

true

Prompt caching

true

Web search

Yes

Vision input

Yes

Audio input

Yes

API

API model ID

gemini-2.5-flash-lite

Batch API

Yes

General

Knowledge cutoff

January 2025

Benchmarks

Sourced evaluation scores, each verified against its primary source.

GPQA Diamond

GPQA Diamond: 64.5%

64.5 %Verified

SWE-bench Verified

SWE-bench Verified: 41.3%

41.3 %Verified

AIME 2025

AIME 2025: 68.9%

68.9 %Verified

MMLU-Pro

MMLU-Pro: 76.2%

76.2 %Verified

LiveCodeBench

LiveCodeBench: 52.4%

52.4 %Verified

Loading…