Loading…
Loading…
Top-tier Pro model in the Gemini 3.5 family: the Pro counterpart to 3.5 Flash with a 2M-token context window, the strongest reasoning and coding scores in the family, and full multimodal input. Pricier than 3.5 Flash but the highest-accuracy general-purpose Gemini.
Every value carries a primary source and a verification date.
Sourced evaluation scores, each verified against its primary source.
GPQA Diamond
GPQA Diamond: 91.8%
SWE-bench Verified
SWE-bench Verified: 78.4%
Humanity's Last Exam (no tools)
Humanity's Last Exam (no tools): 32.6%
AIME 2025
AIME 2025: 94.5%
MMMU-Pro
MMMU-Pro: 82.1%
MMLU-Pro
MMLU-Pro: 89.7%
LiveCodeBench
LiveCodeBench: 81.3%