Loading…
Loading…
Open-weight native-multimodal agentic model; selectable id in the API chat enum (kimi-k2.5) AND open-sourced at github.com/MoonshotAI/Kimi-K2.5 (repo tagline 'Moonshot's most powerful model'; HF org huggingface.co/moonshotai). README verbatim: 'Kimi K2.5 is an open-source, native multimodal agentic
Every value carries a primary source and a verification date.
Sourced evaluation scores, each verified against its primary source.
MMLU-Pro
<td align="center" style="vertical-align: middle">MMLU-Pro</td> <td align="center" style="vertical-align: middle">87.1</td>
GPQA Diamond
<td align="center" style="vertical-align: middle">GPQA-Diamond</td> <td align="center" style="vertical-align: middle">87.6</td>
AIME 2025
<td align="center" style="vertical-align: middle">AIME 2025</td> <td align="center" style="vertical-align: middle">96.1</td>
HMMT 2025 (Feb)
<td align="center" style="vertical-align: middle">HMMT 2025 (Feb)</td> <td align="center" style="vertical-align: middle">95.4</td>
HLE-Full
<td align="center" style="vertical-align: middle">HLE-Full</td> <td align="center" style="vertical-align: middle">30.1</td>
HLE-Full (w/ tools)
<td align="center" style="vertical-align: middle">HLE-Full<br>(w/ tools)</td> <td align="center" style="vertical-align: middle">50.2</td>
IMO-AnswerBench
<td align="center" style="vertical-align: middle">IMO-AnswerBench</td> <td align="center" style="vertical-align: middle">81.8</td>
SWE-bench Verified
<td align="center" style="vertical-align: middle">SWE-Bench Verified</td> <td align="center" style="vertical-align: middle">76.8</td>
SWE-Bench Pro
<td align="center" style="vertical-align: middle">SWE-Bench Pro</td> <td align="center" style="vertical-align: middle">50.7</td>
SWE-Bench Multilingual
<td align="center" style="vertical-align: middle">SWE-Bench Multilingual</td> <td align="center" style="vertical-align: middle">73.0</td>
LiveCodeBench (v6)
<td align="center" style="vertical-align: middle">LiveCodeBench (v6)</td> <td align="center" style="vertical-align: middle">85.0</td>
Terminal Bench 2.0
<td align="center" style="vertical-align: middle">Terminal Bench 2.0</td> <td align="center" style="vertical-align: middle">50.8</td>
OJBench (cpp)
<td align="center" style="vertical-align: middle">OJBench (cpp)</td> <td align="center" style="vertical-align: middle">57.4</td>
SciCode
<td align="center" style="vertical-align: middle">SciCode</td> <td align="center" style="vertical-align: middle">48.7</td>
MMMU-Pro
<td align="center" style="vertical-align: middle">MMMU-Pro</td> <td align="center" style="vertical-align: middle">78.5</td>
MathVision
<td align="center" style="vertical-align: middle">MathVision</td> <td align="center" style="vertical-align: middle">84.2</td>
MathVista (mini)
<td align="center" style="vertical-align: middle">MathVista (mini)</td> <td align="center" style="vertical-align: middle">90.1</td>
OCRBench
<td align="center" style="vertical-align: middle">OCRBench</td> <td align="center" style="vertical-align: middle">92.3</td>
VideoMMMU
<td align="center" style="vertical-align: middle">VideoMMMU</td> <td align="center" style="vertical-align: middle">86.6</td>
VideoMME
<td align="center" style="vertical-align: middle">VideoMME</td> <td align="center" style="vertical-align: middle">87.4</td>
Longbench v2
<td align="center" style="vertical-align: middle">Longbench v2</td> <td align="center" style="vertical-align: middle">61.0</td>
AA-LCR
<td align="center" style="vertical-align: middle">AA-LCR</td> <td align="center" style="vertical-align: middle">70.0</td>
BrowseComp
<td align="center" style="vertical-align: middle">BrowseComp</td> <td align="center" style="vertical-align: middle">60.6</td>
DeepSearchQA
<td align="center" style="vertical-align: middle">DeepSearchQA</td> <td align="center" style="vertical-align: middle">77.1</td>