Loading…
Loading…
Smaller natively-multimodal MoE sibling (17B active / 16 experts / 109B total), fits on a single H100. Confirmed current on llama.developer.meta.com/docs/models: "Model ID: `Llama-4-Scout-17B-16E-Instruct-FP8`" — "A class-leading natively multimodal model that offers superior text and visual intelli
Every value carries a primary source and a verification date.
Sourced evaluation scores, each verified against its primary source.
MMLU-Pro
| Reasoning & Knowledge | MMLU Pro | 0 | macro\_avg/acc | 68.9 | 73.4 | 74.3 | 80.5 |
GPQA Diamond
| | GPQA Diamond | 0 | accuracy | 50.5 | 49.0 | 57.2 | 69.8 |
LiveCodeBench
| Coding | LiveCodeBench (10/01/2024-02/01/2025) | 0 | pass@1 | 33.3 | 27.7 | 32.8 | 43.4 |
MMMU
| Image Reasoning | MMMU | 0 | accuracy | No multimodal support | | 69.4 | 73.4 |
MMMU Pro
| | MMMU Pro^ | 0 | accuracy | | | 52.2 | 59.6 |
MathVista
| | MathVista | 0 | accuracy | | | 70.7 | 73.7 |
ChartQA
| Image Understanding | ChartQA | 0 | relaxed\_accuracy | | | 88.8 | 90.0 |
DocVQA (test)
| | DocVQA (test) | 0 | anls | | | 94.4 | 94.4 |
MGSM
| Multilingual | MGSM | 0 | average/em | 91.1 | 91.6 | 90.6 | 92.3 |
MMLU (pre-trained)
| Reasoning & Knowledge | MMLU | 5 | macro\_avg/acc\_char | 79.3 | 85.2 | 79.6 | 85.5 |
MMLU-Pro (pre-trained)
| | MMLU-Pro | 5 | macro\_avg/em | 53.8 | 61.6 | 58.2 | 62.9 |
MATH (pre-trained)
| | MATH | 4 | em\_maj1@1 | 41.6 | 53.5 | 50.3 | 61.2 |