Loading…
Loading…
Largest open-weight Qwen3 MoE flagship (235B total / 22B active), Apache 2.0. QwenLM/Qwen3 GitHub README: '2025.07.21: We released the updated version of Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring significant enhancements over the previous version and supportin
Every value carries a primary source and a verification date.
Sourced evaluation scores, each verified against its primary source.
MMLU-Pro
| MMLU-Pro | 81.2 | 79.8 | **86.6** | 81.1 | 75.2 | 83.0 |
MMLU-Redux
| MMLU-Redux | 90.4 | 91.3 | **94.2** | 92.7 | 89.2 | 93.1 |
GPQA Diamond
| GPQA | 68.4 | 66.9 | 74.9 | 75.1 | 62.9 | **77.5** |
SuperGPQA
| SuperGPQA | 57.3 | 51.0 | 56.5 | 57.2 | 48.2 | **62.6** |
SimpleQA
| SimpleQA | 27.2 | 40.3 | 22.8 | 31.0 | 12.2 | **54.3** |
AIME25
| AIME25 | 46.6 | 26.7 | 33.9 | 49.5 | 24.7 | **70.3** |
HMMT25
| HMMT25 | 27.5 | 7.9 | 15.9 | 38.8 | 10.0 | **55.4** |
ARC-AGI
| ARC-AGI | 9.0 | 8.8 | 30.3 | 13.3 | 4.3 | **41.8** |
ZebraLogic
| ZebraLogic | 83.4 | 52.6 | - | 89.0 | 37.7 | **95.0** |
LiveBench 20241125
| LiveBench 20241125 | 66.9 | 63.7 | 74.6 | **76.4** | 62.5 | 75.4 |
LiveCodeBench v6 (25.02-25.05)
| LiveCodeBench v6 (25.02-25.05) | 45.2 | 35.8 | 44.6 | 48.9 | 32.9 | **51.8** |
MultiPL-E
| MultiPL-E | 82.2 | 82.7 | **88.5** | 85.7 | 79.3 | 87.9 |
Aider-Polyglot
| Aider-Polyglot | 55.1 | 45.3 | **70.7** | 59.0 | 59.6 | 57.3 |
IFEval
| IFEval | 82.3 | 83.9 | 87.4 | **89.8** | 83.2 | 88.7 |
Arena-Hard v2
| Arena-Hard v2* | 45.6 | 61.9 | 51.5 | 66.1 | 52.0 | **79.2** |
Creative Writing v3
| Creative Writing v3 | 81.6 | 84.9 | 83.8 | **88.1** | 80.4 | 87.5 |
WritingBench
| WritingBench | 74.5 | 75.5 | 79.2 | **86.2** | 77.0 | 85.2 |
BFCL-v3
| BFCL-v3 | 64.7 | 66.5 | 60.1 | 65.2 | 68.0 | **70.9** |
TAU2-Retail
| TAU2-Retail | 71.1 | 66.7# | **75.5** | 70.6 | 64.9 | 74.6 |
MultiIF
| MultiIF | 66.5 | 70.4 | - | 76.2 | 70.2 | **77.5** |
MMLU-ProX
| MMLU-ProX | 75.8 | 76.2 | - | 74.5 | 73.2 | **79.4** |
PolyMATH
| PolyMATH | 32.2 | 25.5 | 30.0 | 44.8 | 27.0 | **50.2** |