Loading…
Loading…
Flagship natively-multimodal MoE (17B active / 128 experts / 400B total). Confirmed current on the Llama API models doc (llama.developer.meta.com/docs/models): "Model ID: `Llama-4-Maverick-17B-128E-Instruct-FP8`" — "An industry-leading natively multimodal model that offers image and text understandi
Every value carries a primary source and a verification date.
Sourced evaluation scores, each verified against its primary source.
MMLU-Pro
<td> MMLU Pro</td> <td>0</td> <td>macro_avg/acc</td> <td> 68.9 </td> <td> 73.4 </td> <td>74.3</td> <td>80.5</td>
GPQA Diamond
<td> GPQA Diamond</td> <td>0</td> <td>accuracy</td> <td> 50.5 </td> <td> 49.0 </td> <td>57.2</td> <td>69.8</td>
LiveCodeBench
<td> LiveCodeBench (10/01/2024-02/01/2025) </td> <td>0</td> <td>pass@1</td> <td> 33.3 </td> <td> 27.7 </td> <td>32.8</td> <td>43.4</td>
MMMU
<td>MMMU</td> <td>0</td> <td>accuracy</td> <td colspan="2"> No multimodal support </td> <td>69.4</td> <td>73.4</td>
MMMU Pro
<td>MMMU Pro^</td> <td>0</td> <td>accuracy</td> <td colspan="2"> </td> <td>52.2</td> <td>59.6</td>
MathVista
<td>MathVista</td> <td>0</td> <td>accuracy</td> <td colspan="2"> </td> <td>70.7</td> <td>73.7</td>
ChartQA
<td>ChartQA</td> <td>0</td> <td>relaxed_accuracy</td> <td colspan="2"> </td> <td>88.8</td> <td>90.0</td>
DocVQA (test)
<td>DocVQA (test)</td> <td>0</td> <td>anls</td> <td colspan="2"> </td> <td>94.4</td> <td>94.4</td>
MGSM
<td> MGSM</td> <td>0</td> <td>average/em</td> <td> 91.1 </td> <td> 91.6</td> <td>90.6</td> <td>92.3</td>
LMArena (experimental chat version, ELO)
Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on