Interactive table showing model combined scores (mean (+- standard deviation)) across several capabilities. Click column headers to sort, or use the search box to filter.
| Model | total | R-Syn-1 | R-Syn-Max | R-Sem | S-Syn-1 | S-Syn-Max | S-Sem-R | S-Sem-W-1 | S-Sem-W-max |
|---|---|---|---|---|---|---|---|---|---|
| Qwen-3.5-397B | 0.977 (±0.133) | 0.941 (±0.215) | 0.994 (±0.015) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.940 (±0.211) | 0.940 (±0.211) |
| Claude Opus 4.6 | 0.896 (±0.270) | 0.934 (±0.234) | 0.996 (±0.013) | 0.657 (±0.415) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.747 (±0.362) | 0.837 (±0.311) |
| Claude Sonnet 4.6 | 0.844 (±0.321) | 0.934 (±0.234) | 0.996 (±0.013) | 0.282 (±0.300) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.747 (±0.362) | 0.797 (±0.338) |
| Gemini 3 Flash Preview | 0.942 (±0.200) | 0.961 (±0.168) | 0.995 (±0.014) | 0.973 (±0.061) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.760 (±0.367) | 0.850 (±0.312) |
| GPT5.2-chat | 0.955 (±0.178) | 0.962 (±0.168) | 0.995 (±0.014) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.807 (±0.332) | 0.877 (±0.276) |
| GPT5.4 2026/03 | 0.908 (±0.253) | 0.934 (±0.234) | 0.996 (±0.013) | 0.998 (±0.020) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.647 (±0.388) | 0.687 (±0.381) |
| Claude 3.5 Haiku | 0.925 (±0.232) | 0.937 (±0.203) | 0.984 (±0.029) | 0.779 (±0.395) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.817 (±0.325) | 0.887 (±0.266) |
| Claude 3.5 Sonnet | 0.933 (±0.222) | 0.950 (±0.175) | 0.990 (±0.022) | 0.832 (±0.370) | 0.980 (±0.125) | 1.000 (±0.000) | 1.000 (±0.000) | 0.857 (±0.295) | 0.857 (±0.295) |
| Deepseek-Coder-33B | 0.694 (±0.396) | 0.773 (±0.366) | 0.882 (±0.269) | 0.263 (±0.297) | 0.943 (±0.221) | 0.984 (±0.112) | 0.313 (±0.309) | 0.689 (±0.379) | 0.703 (±0.374) |
| Deepseek-R1 | 0.931 (±0.226) | 0.955 (±0.174) | 0.991 (±0.020) | 0.992 (±0.089) | 0.935 (±0.247) | 1.000 (±0.000) | 1.000 (±0.000) | 0.746 (±0.382) | 0.832 (±0.316) |
| Deepseek-Chat-v3 | 0.848 (±0.326) | 0.843 (±0.347) | 0.991 (±0.020) | 0.591 (±0.466) | 0.957 (±0.202) | 0.997 (±0.050) | 0.923 (±0.214) | 0.702 (±0.384) | 0.782 (±0.348) |
| Gemini 1.5 Flash | 0.907 (±0.263) | 0.920 (±0.242) | 0.983 (±0.028) | 0.878 (±0.325) | 0.865 (±0.324) | 0.910 (±0.272) | 1.000 (±0.000) | 0.850 (±0.304) | 0.850 (±0.304) |
| Gemini 1.5 Pro | 0.896 (±0.282) | 0.887 (±0.291) | 0.966 (±0.127) | 0.796 (±0.399) | 0.845 (±0.339) | 0.905 (±0.286) | 1.000 (±0.000) | 0.883 (±0.276) | 0.883 (±0.276) |
| Gemini 2.0 Flash Exp | 0.895 (±0.260) | 0.986 (±0.025) | 0.988 (±0.024) | 0.931 (±0.197) | 0.994 (±0.079) | 1.000 (±0.000) | 1.000 (±0.000) | 0.604 (±0.394) | 0.657 (±0.387) |
| Llama-3.1-70B | 0.860 (±0.311) | 0.908 (±0.234) | 0.973 (±0.034) | 0.559 (±0.484) | 0.997 (±0.050) | 0.997 (±0.050) | 1.000 (±0.000) | 0.694 (±0.381) | 0.754 (±0.361) |
| Llama-3.1-8B | 0.530 (±0.452) | 0.779 (±0.375) | 0.915 (±0.228) | 0.462 (±0.421) | 0.401 (±0.475) | 0.521 (±0.477) | 0.535 (±0.377) | 0.273 (±0.401) | 0.355 (±0.425) |
| Llama-3.2-1B | 0.123 (±0.276) | 0.250 (±0.366) | 0.411 (±0.409) | 0.159 (±0.254) | 0.026 (±0.143) | 0.079 (±0.260) | 0.021 (±0.070) | 0.010 (±0.050) | 0.027 (±0.073) |
| Llama-3.2-3B | 0.335 (±0.416) | 0.402 (±0.452) | 0.773 (±0.332) | 0.344 (±0.397) | 0.196 (±0.374) | 0.322 (±0.444) | 0.308 (±0.373) | 0.120 (±0.256) | 0.212 (±0.308) |
| Llama-3.3-70B | 0.853 (±0.318) | 0.975 (±0.032) | 0.978 (±0.029) | 0.595 (±0.487) | 0.985 (±0.122) | 1.000 (±0.000) | 1.000 (±0.000) | 0.617 (±0.398) | 0.671 (±0.385) |
| Llama-3.0-70B | 0.847 (±0.324) | 0.961 (±0.114) | 0.974 (±0.033) | 0.523 (±0.480) | 0.955 (±0.208) | 0.990 (±0.099) | 1.000 (±0.000) | 0.645 (±0.405) | 0.731 (±0.367) |
| Llama-3.0-8B | 0.434 (±0.435) | 0.586 (±0.426) | 0.632 (±0.416) | 0.219 (±0.290) | 0.271 (±0.445) | 0.425 (±0.488) | 0.615 (±0.337) | 0.281 (±0.397) | 0.445 (±0.417) |
| Llama-4-Maverick | 0.859 (±0.305) | 0.870 (±0.241) | 0.974 (±0.033) | 0.655 (±0.465) | 0.960 (±0.196) | 1.000 (±0.000) | 0.910 (±0.244) | 0.687 (±0.381) | 0.815 (±0.327) |
| GPT3.5 2024/01 | 0.800 (±0.356) | 0.975 (±0.126) | 0.995 (±0.014) | 0.411 (±0.442) | 0.944 (±0.230) | 1.000 (±0.000) | 0.696 (±0.374) | 0.674 (±0.387) | 0.707 (±0.376) |
| GPT4o 2024/11 | 0.902 (±0.244) | 0.937 (±0.212) | 0.986 (±0.024) | 0.726 (±0.377) | 1.000 (±0.000) | 1.000 (±0.000) | 0.881 (±0.183) | 0.817 (±0.325) | 0.867 (±0.286) |
| GPT4o-mini 2024/07 | 0.827 (±0.333) | 0.919 (±0.232) | 0.983 (±0.030) | 0.384 (±0.415) | 0.921 (±0.246) | 0.960 (±0.174) | 0.962 (±0.089) | 0.709 (±0.385) | 0.777 (±0.349) |
| GPTo1-mini 2024/09 | 0.911 (±0.251) | 0.835 (±0.351) | 0.992 (±0.018) | 0.994 (±0.031) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.697 (±0.379) | 0.767 (±0.354) |
| GPTo1-pre 2024/09 | 0.889 (±0.268) | 0.911 (±0.256) | 0.992 (±0.020) | 0.658 (±0.373) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.742 (±0.365) | 0.812 (±0.329) |
| OpenCoder-8B | 0.557 (±0.454) | 0.746 (±0.405) | 0.817 (±0.354) | 0.167 (±0.285) | 0.622 (±0.482) | 0.737 (±0.437) | 0.400 (±0.422) | 0.459 (±0.422) | 0.509 (±0.417) |
| Phi-3.5-mini | 0.481 (±0.432) | 0.608 (±0.412) | 0.639 (±0.390) | 0.176 (±0.297) | 0.637 (±0.466) | 0.683 (±0.450) | 0.450 (±0.381) | 0.309 (±0.367) | 0.350 (±0.371) |
| Phi-3.5-MoE | 0.738 (±0.359) | 0.831 (±0.296) | 0.841 (±0.287) | 0.517 (±0.421) | 0.808 (±0.391) | 0.932 (±0.238) | 0.688 (±0.185) | 0.637 (±0.394) | 0.648 (±0.389) |
| Phi-3.0-medium-128k | 0.561 (±0.439) | 0.838 (±0.318) | 0.886 (±0.257) | 0.248 (±0.364) | 0.547 (±0.475) | 0.603 (±0.466) | 0.625 (±0.316) | 0.360 (±0.412) | 0.385 (±0.417) |
| Phi-3.0-mini-128k | 0.431 (±0.415) | 0.582 (±0.424) | 0.660 (±0.388) | 0.263 (±0.333) | 0.486 (±0.479) | 0.549 (±0.480) | 0.428 (±0.336) | 0.231 (±0.291) | 0.245 (±0.292) |
| Phi-3.0-small-128k | 0.374 (±0.413) | 0.346 (±0.394) | 0.432 (±0.385) | 0.284 (±0.364) | 0.366 (±0.427) | 0.394 (±0.429) | 0.593 (±0.487) | 0.278 (±0.352) | 0.300 (±0.356) |
| Qwen-2.0-0.5B | 0.038 (±0.138) | 0.068 (±0.159) | 0.076 (±0.171) | 0.085 (±0.205) | 0.005 (±0.071) | 0.012 (±0.111) | 0.040 (±0.136) | 0.006 (±0.072) | 0.010 (±0.080) |
| Qwen-2.0-1.5B | 0.189 (±0.332) | 0.126 (±0.294) | 0.145 (±0.314) | 0.222 (±0.349) | 0.293 (±0.448) | 0.351 (±0.465) | 0.154 (±0.196) | 0.105 (±0.177) | 0.115 (±0.182) |
| Qwen-2.5-0.5B | 0.097 (±0.244) | 0.053 (±0.153) | 0.101 (±0.235) | 0.083 (±0.199) | 0.157 (±0.360) | 0.185 (±0.384) | 0.071 (±0.179) | 0.061 (±0.128) | 0.064 (±0.129) |
| Qwen-2.5-14B | 0.763 (±0.390) | 0.781 (±0.393) | 0.922 (±0.245) | 0.331 (±0.432) | 0.897 (±0.303) | 0.910 (±0.286) | 0.933 (±0.240) | 0.658 (±0.378) | 0.671 (±0.374) |
| Qwen-2.5-1.5B | 0.362 (±0.428) | 0.470 (±0.459) | 0.584 (±0.452) | 0.266 (±0.339) | 0.494 (±0.485) | 0.527 (±0.482) | 0.127 (±0.268) | 0.186 (±0.272) | 0.244 (±0.326) |
| Qwen-2.5-32B | 0.826 (±0.341) | 0.979 (±0.030) | 0.982 (±0.028) | 0.603 (±0.471) | 0.992 (±0.080) | 1.000 (±0.000) | 0.800 (±0.400) | 0.603 (±0.391) | 0.651 (±0.388) |
| Qwen-2.5-3B | 0.603 (±0.441) | 0.718 (±0.410) | 0.857 (±0.292) | 0.374 (±0.434) | 0.733 (±0.431) | 0.803 (±0.384) | 0.453 (±0.451) | 0.407 (±0.395) | 0.479 (±0.394) |
| Qwen-2.5-72B | 0.877 (±0.300) | 0.871 (±0.317) | 0.987 (±0.025) | 0.614 (±0.471) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.731 (±0.369) | 0.811 (±0.329) |
| Qwen-2.0-57B-A14B | 0.673 (±0.424) | 0.741 (±0.399) | 0.932 (±0.173) | 0.222 (±0.370) | 0.860 (±0.345) | 0.895 (±0.307) | 0.630 (±0.438) | 0.510 (±0.399) | 0.599 (±0.393) |
| Qwen-2.5-7B | 0.739 (±0.394) | 0.966 (±0.139) | 0.973 (±0.119) | 0.329 (±0.411) | 0.917 (±0.258) | 0.976 (±0.136) | 0.586 (±0.459) | 0.565 (±0.397) | 0.603 (±0.391) |
| Qwen-2.5-Coder-32B | 0.881 (±0.297) | 0.937 (±0.219) | 0.991 (±0.017) | 0.478 (±0.476) | 1.000 (±0.000) | 1.000 (±0.000) | 1.000 (±0.000) | 0.814 (±0.325) | 0.830 (±0.314) |
| Qwen-2.0-72B | 0.818 (±0.338) | 0.964 (±0.040) | 0.971 (±0.038) | 0.339 (±0.426) | 0.950 (±0.199) | 1.000 (±0.000) | 1.000 (±0.000) | 0.630 (±0.380) | 0.688 (±0.365) |
| Qwen-2.0-7B | 0.551 (±0.452) | 0.566 (±0.452) | 0.739 (±0.405) | 0.232 (±0.305) | 0.799 (±0.397) | 0.836 (±0.365) | 0.573 (±0.476) | 0.298 (±0.359) | 0.369 (±0.389) |
| Qwen-3-235B | 0.919 (±0.246) | 0.912 (±0.260) | 0.993 (±0.017) | 0.980 (±0.139) | 0.960 (±0.196) | 0.993 (±0.086) | 1.000 (±0.000) | 0.704 (±0.396) | 0.813 (±0.337) |
This table provides a detailed breakdown showing the average main score for each benchmark task for each model.
| Model | RdfConnectionExplainStatic-jsonld (listTrimF1) | RdfConnectionExplainStatic-nt (listTrimF1) | RdfConnectionExplainStatic-turtle (listTrimF1) | RdfConnectionExplainStatic-xml (listTrimF1) | RdfFriendCount-jsonld-1 (f1) | RdfFriendCount-jsonld-2 (f1) | RdfFriendCount-nt-1 (f1) | RdfFriendCount-nt-2 (f1) | RdfFriendCount-turtle-1 (f1) | RdfFriendCount-turtle-2 (f1) | RdfFriendCount-xml-1 (f1) | RdfFriendCount-xml-2 (f1) | RdfSyntaxFixList-jsonld (max_combined) | RdfSyntaxFixList-nt (max_combined) | RdfSyntaxFixList-turtle (max_combined) | Sparql2AnswerListOrga-jsonld (combinedF1) | Sparql2AnswerListOrga-turtle (combinedF1) | SparqlSyntaxFixingListLcQuad (max_combined) | Text2AnswerListOrga-jsonld (combinedF1) | Text2AnswerListOrga-turtle (combinedF1) | Text2SparqlExecEvalListBeastiary-turtle-schema (max_combined) | Text2SparqlExecEvalListBeastiary-turtle-subgraph (max_combined) | Text2SparqlExecEvalListBeastiary-turtle-subschema (max_combined) | Text2SparqlExecEvalListCoypuMini (max_combined) | Text2SparqlExecEvalListOrgaNumerical (max_combined) | Text2SparqlExecEvalListOrganizational (max_combined) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude 3.5 Haiku | 1.000 | 0.903 | 0.889 | 0.995 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.979 | 0.997 | 0.975 | 1.000 | 1.000 | 1.000 | 0.938 | 0.850 | 0.373 | 0.947 | 0.307 | 0.600 | 1.000 | 1.000 |
| Claude 3.5 Sonnet | 1.000 | 0.972 | 1.000 | 1.000 | 1.000 | 1.000 | 0.100 | 0.300 | 0.950 | 1.000 | 0.550 | 1.000 | 0.989 | 0.997 | 0.984 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.411 | 0.947 | 0.427 | 0.520 | 0.960 | 1.000 |
| Claude Opus 4.6 | 0.958 | 1.000 | 1.000 | 1.000 | 0.095 | 0.095 | 0.219 | 0.204 | 1.000 | 1.000 | 0.149 | 0.103 | 0.989 | 0.999 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.787 | 0.947 | 0.667 | 0.600 | 0.960 | 0.840 |
| Claude Sonnet 4.6 | 0.456 | 0.548 | 0.456 | 0.893 | 0.092 | 0.092 | 0.052 | 0.048 | 0.091 | 0.091 | 0.092 | 0.092 | 0.989 | 0.997 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.709 | 0.947 | 0.747 | 0.600 | 0.840 | 0.800 |
| Deepseek-Chat-v3 | 0.989 | 0.856 | 0.896 | 0.980 | 0.293 | 0.396 | 0.250 | 0.309 | 0.544 | 0.394 | 0.514 | 0.534 | 0.989 | 0.993 | 0.991 | 0.993 | 0.922 | 1.000 | 0.905 | 0.713 | 0.456 | 0.947 | 0.408 | 0.520 | 0.680 | 0.980 |
| Deepseek-Coder-33B | 0.529 | 0.241 | 0.438 | 0.447 | 0.204 | 0.334 | 0.111 | 0.172 | 0.068 | 0.083 | 0.349 | 0.382 | 0.976 | 0.829 | 0.841 | 0.224 | 0.313 | 0.968 | 0.228 | 0.221 | 0.337 | 0.915 | 0.399 | 0.575 | 0.571 | 0.752 |
| Deepseek-R1 | 0.960 | 0.997 | 1.000 | 0.980 | 1.000 | 1.000 | 1.000 | 1.000 | 0.980 | 1.000 | 1.000 | 1.000 | 0.989 | 0.997 | 0.987 | 0.993 | 1.000 | 1.000 | 0.970 | 0.985 | 0.443 | 0.952 | 0.349 | 0.520 | 0.872 | 0.984 |
| GPT3.5 2024/01 | 0.901 | 0.801 | 0.848 | 0.813 | 0.200 | 0.550 | 0.000 | 0.000 | 0.000 | 0.000 | 0.550 | 0.900 | 0.989 | 0.998 | 0.997 | 0.500 | 0.696 | 1.000 | 0.506 | 0.850 | 0.387 | 0.947 | 0.261 | 0.520 | 0.600 | 0.760 |
| GPT4o 2024/11 | 0.850 | 0.598 | 0.867 | 0.845 | 1.000 | 1.000 | 0.000 | 0.100 | 1.000 | 1.000 | 0.000 | 1.000 | 0.989 | 0.997 | 0.972 | 0.950 | 0.881 | 1.000 | 0.950 | 0.950 | 0.307 | 0.947 | 0.307 | 0.520 | 1.000 | 1.000 |
| GPT4o-mini 2024/07 | 0.833 | 0.723 | 0.828 | 0.753 | 0.100 | 0.000 | 0.000 | 0.150 | 0.200 | 0.250 | 1.000 | 1.000 | 0.979 | 0.995 | 0.977 | 0.758 | 0.962 | 0.920 | 0.775 | 0.950 | 0.328 | 0.947 | 0.255 | 0.520 | 0.640 | 1.000 |
| GPT5.2-chat | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 0.997 | 0.999 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.587 | 0.947 | 0.627 | 0.560 | 1.000 | 1.000 |
| GPT5.4 2026/03 | 1.000 | 0.975 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 0.997 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.587 | 0.947 | 0.507 | 0.520 | 0.680 | 0.600 |
| GPTo1-mini 2024/09 | 0.983 | 0.975 | 1.000 | 0.983 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 0.997 | 0.990 | 0.975 | 1.000 | 1.000 | 1.000 | 0.975 | 0.387 | 0.947 | 0.347 | 0.520 | 0.680 | 0.920 |
| GPTo1-pre 2024/09 | 0.914 | 0.827 | 0.925 | 0.936 | 0.502 | 0.498 | 0.363 | 0.488 | 0.532 | 0.592 | 0.612 | 0.780 | 0.989 | 0.995 | 0.992 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.467 | 0.888 | 0.347 | 0.720 | 0.760 | 0.880 |
| Gemini 1.5 Flash | 1.000 | 0.976 | 1.000 | 1.000 | 0.000 | 1.000 | 0.950 | 0.850 | 1.000 | 1.000 | 0.050 | 0.950 | 0.973 | 1.000 | 0.977 | 1.000 | 1.000 | 0.820 | 1.000 | 1.000 | 0.547 | 0.960 | 0.234 | 0.520 | 0.920 | 1.000 |
| Gemini 1.5 Pro | 1.000 | 0.964 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.919 | 1.000 | 0.978 | 1.000 | 1.000 | 0.810 | 1.000 | 1.000 | 0.467 | 0.973 | 0.305 | 0.560 | 1.000 | 1.000 |
| Gemini 2.0 Flash Exp | 1.000 | 0.642 | 1.000 | 0.914 | 1.000 | 1.000 | 0.900 | 0.850 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 1.000 | 0.975 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.307 | 0.947 | 0.280 | 0.520 | 0.320 | 0.840 |
| Gemini 3 Flash Preview | 1.000 | 0.858 | 1.000 | 0.875 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 0.996 | 0.998 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.467 | 1.000 | 0.461 | 0.680 | 0.840 | 0.880 |
| Llama-3.0-70B | 0.990 | 0.770 | 0.997 | 0.970 | 0.500 | 0.980 | 0.020 | 0.000 | 0.000 | 0.000 | 0.060 | 0.660 | 0.987 | 1.000 | 0.936 | 1.000 | 1.000 | 0.980 | 1.000 | 1.000 | 0.352 | 0.947 | 0.233 | 0.537 | 0.568 | 0.872 |
| Llama-3.0-8B | 0.353 | 0.556 | 0.700 | 0.585 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.723 | 0.427 | 0.744 | 0.663 | 0.615 | 0.076 | 0.660 | 0.770 | 0.217 | 0.594 | 0.157 | 0.511 | 0.276 | 0.399 |
| Llama-3.1-70B | 0.990 | 0.834 | 0.993 | 0.988 | 0.060 | 0.060 | 0.860 | 0.780 | 0.020 | 0.000 | 1.000 | 0.980 | 0.979 | 1.000 | 0.938 | 0.902 | 1.000 | 1.000 | 1.000 | 0.960 | 0.417 | 0.947 | 0.261 | 0.516 | 0.808 | 0.744 |
| Llama-3.1-8B | 0.756 | 0.593 | 0.694 | 0.617 | 0.240 | 0.340 | 0.640 | 0.740 | 0.000 | 0.000 | 0.440 | 0.780 | 0.979 | 0.900 | 0.864 | 0.638 | 0.535 | 0.476 | 0.464 | 0.518 | 0.229 | 0.561 | 0.156 | 0.393 | 0.243 | 0.224 |
| Llama-3.2-1B | 0.382 | 0.142 | 0.371 | 0.442 | 0.040 | 0.040 | 0.060 | 0.000 | 0.054 | 0.059 | 0.060 | 0.020 | 0.644 | 0.410 | 0.180 | 0.016 | 0.021 | 0.032 | 0.054 | 0.051 | 0.004 | 0.039 | 0.025 | 0.016 | 0.040 | 0.012 |
| Llama-3.2-3B | 0.705 | 0.762 | 0.685 | 0.732 | 0.020 | 0.020 | 0.020 | 0.040 | 0.180 | 0.280 | 0.280 | 0.380 | 0.891 | 0.885 | 0.542 | 0.308 | 0.308 | 0.094 | 0.217 | 0.205 | 0.095 | 0.210 | 0.069 | 0.178 | 0.116 | 0.346 |
| Llama-3.3-70B | 1.000 | 0.951 | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 0.987 | 0.995 | 0.953 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.451 | 0.947 | 0.392 | 0.536 | 0.520 | 0.680 |
| Llama-4-Maverick | 1.000 | 0.911 | 1.000 | 0.997 | 1.000 | 1.000 | 0.184 | 0.202 | 0.141 | 0.111 | 0.600 | 0.600 | 0.970 | 0.993 | 0.960 | 0.865 | 0.910 | 1.000 | 0.790 | 0.729 | 0.401 | 0.947 | 0.237 | 0.696 | 0.824 | 0.792 |
| OpenCoder-8B | 0.214 | 0.062 | 0.177 | 0.202 | 0.169 | 0.321 | 0.086 | 0.050 | 0.193 | 0.199 | 0.036 | 0.082 | 0.914 | 0.683 | 0.852 | 0.267 | 0.400 | 0.668 | 0.307 | 0.340 | 0.313 | 0.710 | 0.282 | 0.221 | 0.440 | 0.667 |
| Phi-3.0-medium-128k | 0.320 | 0.652 | 0.645 | 0.766 | 0.000 | 0.025 | 0.024 | 0.047 | 0.000 | 0.000 | 0.000 | 0.000 | 0.904 | 0.843 | 0.911 | 0.520 | 0.625 | 0.556 | 0.472 | 0.581 | 0.279 | 0.710 | 0.204 | 0.082 | 0.167 | 0.579 |
| Phi-3.0-mini-128k | 0.635 | 0.416 | 0.709 | 0.665 | 0.040 | 0.140 | 0.000 | 0.000 | 0.020 | 0.000 | 0.000 | 0.000 | 0.899 | 0.503 | 0.579 | 0.505 | 0.428 | 0.367 | 0.642 | 0.557 | 0.205 | 0.365 | 0.220 | 0.037 | 0.188 | 0.392 |
| Phi-3.0-small-128k | 0.837 | 0.618 | 0.645 | 0.731 | 0.000 | 0.000 | 0.000 | 0.000 | 0.008 | 0.000 | 0.000 | 0.000 | 0.353 | 0.405 | 0.537 | 0.487 | 0.593 | 0.136 | 0.626 | 0.551 | 0.276 | 0.530 | 0.290 | 0.227 | 0.152 | 0.291 |
| Phi-3.5-MoE | 0.580 | 0.529 | 0.661 | 0.835 | 0.903 | 0.820 | 0.466 | 0.312 | 0.050 | 0.015 | 0.020 | 0.000 | 0.987 | 0.622 | 0.915 | 0.685 | 0.688 | 0.887 | 0.752 | 0.702 | 0.361 | 0.867 | 0.313 | 0.524 | 0.348 | 0.855 |
| Phi-3.5-mini | 0.027 | 0.301 | 0.525 | 0.768 | 0.120 | 0.000 | 0.020 | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 0.935 | 0.392 | 0.588 | 0.456 | 0.450 | 0.576 | 0.405 | 0.435 | 0.202 | 0.609 | 0.277 | 0.227 | 0.244 | 0.320 |
| Qwen-2.0-0.5B | 0.148 | 0.040 | 0.079 | 0.119 | 0.117 | 0.146 | 0.040 | 0.000 | 0.102 | 0.055 | 0.060 | 0.160 | 0.035 | 0.161 | 0.031 | 0.050 | 0.040 | 0.000 | 0.041 | 0.054 | 0.016 | 0.037 | 0.020 | 0.000 | 0.004 | 0.000 |
| Qwen-2.0-1.5B | 0.134 | 0.248 | 0.390 | 0.440 | 0.453 | 0.393 | 0.000 | 0.000 | 0.100 | 0.060 | 0.000 | 0.020 | 0.301 | 0.039 | 0.094 | 0.152 | 0.154 | 0.256 | 0.192 | 0.159 | 0.067 | 0.199 | 0.121 | 0.067 | 0.101 | 0.092 |
| Qwen-2.0-57B-A14B | 0.710 | 0.017 | 0.727 | 0.769 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.987 | 0.919 | 0.889 | 0.395 | 0.630 | 0.800 | 0.830 | 1.000 | 0.328 | 0.698 | 0.192 | 0.528 | 0.360 | 0.808 |
| Qwen-2.0-72B | 0.798 | 0.862 | 0.745 | 0.987 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.340 | 0.980 | 0.974 | 0.990 | 0.950 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.391 | 0.866 | 0.428 | 0.493 | 0.395 | 1.000 |
| Qwen-2.0-7B | 0.631 | 0.512 | 0.564 | 0.591 | 0.000 | 0.020 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.749 | 0.737 | 0.730 | 0.485 | 0.573 | 0.928 | 0.790 | 0.917 | 0.324 | 0.550 | 0.309 | 0.320 | 0.175 | 0.432 |
| Qwen-2.5-0.5B | 0.165 | 0.109 | 0.243 | 0.307 | 0.004 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.100 | 0.420 | 0.097 | 0.060 | 0.146 | 0.047 | 0.071 | 0.088 | 0.088 | 0.053 | 0.012 | 0.053 | 0.079 | 0.037 | 0.072 | 0.096 |
| Qwen-2.5-1.5B | 0.592 | 0.646 | 0.601 | 0.565 | 0.020 | 0.020 | 0.100 | 0.120 | 0.000 | 0.000 | 0.540 | 0.620 | 0.784 | 0.467 | 0.502 | 0.153 | 0.127 | 0.424 | 0.140 | 0.153 | 0.217 | 0.272 | 0.214 | 0.211 | 0.185 | 0.311 |
| Qwen-2.5-14B | 0.887 | 0.662 | 1.000 | 0.763 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.962 | 0.824 | 0.980 | 1.000 | 0.933 | 0.820 | 0.960 | 1.000 | 0.333 | 0.866 | 0.315 | 0.520 | 0.584 | 0.712 |
| Qwen-2.5-32B | 1.000 | 0.326 | 1.000 | 0.982 | 0.000 | 0.000 | 1.000 | 1.000 | 0.340 | 0.380 | 0.000 | 0.000 | 0.987 | 0.993 | 0.966 | 1.000 | 0.800 | 1.000 | 1.000 | 0.960 | 0.307 | 0.947 | 0.307 | 0.520 | 0.568 | 0.568 |
| Qwen-2.5-3B | 0.904 | 0.892 | 0.889 | 0.735 | 0.000 | 0.000 | 0.160 | 0.160 | 0.000 | 0.000 | 0.040 | 0.820 | 0.986 | 0.865 | 0.721 | 0.433 | 0.453 | 0.691 | 0.677 | 0.787 | 0.327 | 0.649 | 0.241 | 0.494 | 0.224 | 0.548 |
| Qwen-2.5-72B | 1.000 | 0.797 | 1.000 | 0.966 | 0.120 | 0.260 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 | 0.987 | 0.998 | 0.976 | 1.000 | 1.000 | 1.000 | 0.800 | 1.000 | 0.333 | 0.947 | 0.333 | 0.520 | 0.840 | 0.936 |
| Qwen-2.5-7B | 0.887 | 0.719 | 0.769 | 0.891 | 0.020 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.962 | 1.000 | 0.957 | 0.360 | 0.586 | 0.952 | 0.520 | 0.800 | 0.288 | 0.915 | 0.353 | 0.520 | 0.376 | 0.600 |
| Qwen-2.5-Coder-32B | 1.000 | 0.580 | 1.000 | 0.998 | 0.020 | 0.020 | 0.100 | 0.120 | 0.420 | 0.520 | 0.000 | 0.000 | 0.987 | 0.996 | 0.990 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.333 | 0.934 | 0.401 | 0.531 | 0.872 | 0.984 |
| Qwen-3-235B | 0.979 | 0.951 | 1.000 | 0.930 | 1.000 | 1.000 | 0.980 | 1.000 | 0.980 | 0.980 | 1.000 | 1.000 | 0.989 | 0.997 | 0.994 | 1.000 | 1.000 | 1.000 | 0.983 | 1.000 | 0.383 | 0.968 | 0.404 | 0.536 | 0.836 | 0.912 |
| Qwen-3.5-397B | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.989 | 0.997 | 0.995 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.613 | 1.000 | 0.440 | 0.840 | 0.920 | 1.000 |
Click on any of the plots below to view a larger version showing the capability compass for each model. The dimensions are explained briefly at the top of the page















































All data is available as CSV and html files: