LLM Benchmark Results

The following page shows the results of LLM-KG-Bench runs on various models and RDF related benchmark tasks. Below is an overview based on the combined scores, main task scores and capability compass plots for each model. At the bottom of the page you can find links to download the underlying data as CSV and HTML files.

Combined Scores Overview

Interactive table showing model combined scores (mean (+- standard deviation)) across several capabilities. Click column headers to sort, or use the search box to filter.

Model total R-Syn-1 R-Syn-Max R-Sem S-Syn-1 S-Syn-Max S-Sem-R S-Sem-W-1 S-Sem-W-max
Qwen-3.5-397B 0.977 (±0.133) 0.941 (±0.215) 0.994 (±0.015) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.940 (±0.211) 0.940 (±0.211)
Claude Opus 4.6 0.896 (±0.270) 0.934 (±0.234) 0.996 (±0.013) 0.657 (±0.415) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.747 (±0.362) 0.837 (±0.311)
Claude Sonnet 4.6 0.844 (±0.321) 0.934 (±0.234) 0.996 (±0.013) 0.282 (±0.300) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.747 (±0.362) 0.797 (±0.338)
Gemini 3 Flash Preview 0.942 (±0.200) 0.961 (±0.168) 0.995 (±0.014) 0.973 (±0.061) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.760 (±0.367) 0.850 (±0.312)
GPT5.2-chat 0.955 (±0.178) 0.962 (±0.168) 0.995 (±0.014) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.807 (±0.332) 0.877 (±0.276)
GPT5.4 2026/03 0.908 (±0.253) 0.934 (±0.234) 0.996 (±0.013) 0.998 (±0.020) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.647 (±0.388) 0.687 (±0.381)
Claude 3.5 Haiku 0.925 (±0.232) 0.937 (±0.203) 0.984 (±0.029) 0.779 (±0.395) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.817 (±0.325) 0.887 (±0.266)
Claude 3.5 Sonnet 0.933 (±0.222) 0.950 (±0.175) 0.990 (±0.022) 0.832 (±0.370) 0.980 (±0.125) 1.000 (±0.000) 1.000 (±0.000) 0.857 (±0.295) 0.857 (±0.295)
Deepseek-Coder-33B 0.694 (±0.396) 0.773 (±0.366) 0.882 (±0.269) 0.263 (±0.297) 0.943 (±0.221) 0.984 (±0.112) 0.313 (±0.309) 0.689 (±0.379) 0.703 (±0.374)
Deepseek-R1 0.931 (±0.226) 0.955 (±0.174) 0.991 (±0.020) 0.992 (±0.089) 0.935 (±0.247) 1.000 (±0.000) 1.000 (±0.000) 0.746 (±0.382) 0.832 (±0.316)
Deepseek-Chat-v3 0.848 (±0.326) 0.843 (±0.347) 0.991 (±0.020) 0.591 (±0.466) 0.957 (±0.202) 0.997 (±0.050) 0.923 (±0.214) 0.702 (±0.384) 0.782 (±0.348)
Gemini 1.5 Flash 0.907 (±0.263) 0.920 (±0.242) 0.983 (±0.028) 0.878 (±0.325) 0.865 (±0.324) 0.910 (±0.272) 1.000 (±0.000) 0.850 (±0.304) 0.850 (±0.304)
Gemini 1.5 Pro 0.896 (±0.282) 0.887 (±0.291) 0.966 (±0.127) 0.796 (±0.399) 0.845 (±0.339) 0.905 (±0.286) 1.000 (±0.000) 0.883 (±0.276) 0.883 (±0.276)
Gemini 2.0 Flash Exp 0.895 (±0.260) 0.986 (±0.025) 0.988 (±0.024) 0.931 (±0.197) 0.994 (±0.079) 1.000 (±0.000) 1.000 (±0.000) 0.604 (±0.394) 0.657 (±0.387)
Llama-3.1-70B 0.860 (±0.311) 0.908 (±0.234) 0.973 (±0.034) 0.559 (±0.484) 0.997 (±0.050) 0.997 (±0.050) 1.000 (±0.000) 0.694 (±0.381) 0.754 (±0.361)
Llama-3.1-8B 0.530 (±0.452) 0.779 (±0.375) 0.915 (±0.228) 0.462 (±0.421) 0.401 (±0.475) 0.521 (±0.477) 0.535 (±0.377) 0.273 (±0.401) 0.355 (±0.425)
Llama-3.2-1B 0.123 (±0.276) 0.250 (±0.366) 0.411 (±0.409) 0.159 (±0.254) 0.026 (±0.143) 0.079 (±0.260) 0.021 (±0.070) 0.010 (±0.050) 0.027 (±0.073)
Llama-3.2-3B 0.335 (±0.416) 0.402 (±0.452) 0.773 (±0.332) 0.344 (±0.397) 0.196 (±0.374) 0.322 (±0.444) 0.308 (±0.373) 0.120 (±0.256) 0.212 (±0.308)
Llama-3.3-70B 0.853 (±0.318) 0.975 (±0.032) 0.978 (±0.029) 0.595 (±0.487) 0.985 (±0.122) 1.000 (±0.000) 1.000 (±0.000) 0.617 (±0.398) 0.671 (±0.385)
Llama-3.0-70B 0.847 (±0.324) 0.961 (±0.114) 0.974 (±0.033) 0.523 (±0.480) 0.955 (±0.208) 0.990 (±0.099) 1.000 (±0.000) 0.645 (±0.405) 0.731 (±0.367)
Llama-3.0-8B 0.434 (±0.435) 0.586 (±0.426) 0.632 (±0.416) 0.219 (±0.290) 0.271 (±0.445) 0.425 (±0.488) 0.615 (±0.337) 0.281 (±0.397) 0.445 (±0.417)
Llama-4-Maverick 0.859 (±0.305) 0.870 (±0.241) 0.974 (±0.033) 0.655 (±0.465) 0.960 (±0.196) 1.000 (±0.000) 0.910 (±0.244) 0.687 (±0.381) 0.815 (±0.327)
GPT3.5 2024/01 0.800 (±0.356) 0.975 (±0.126) 0.995 (±0.014) 0.411 (±0.442) 0.944 (±0.230) 1.000 (±0.000) 0.696 (±0.374) 0.674 (±0.387) 0.707 (±0.376)
GPT4o 2024/11 0.902 (±0.244) 0.937 (±0.212) 0.986 (±0.024) 0.726 (±0.377) 1.000 (±0.000) 1.000 (±0.000) 0.881 (±0.183) 0.817 (±0.325) 0.867 (±0.286)
GPT4o-mini 2024/07 0.827 (±0.333) 0.919 (±0.232) 0.983 (±0.030) 0.384 (±0.415) 0.921 (±0.246) 0.960 (±0.174) 0.962 (±0.089) 0.709 (±0.385) 0.777 (±0.349)
GPTo1-mini 2024/09 0.911 (±0.251) 0.835 (±0.351) 0.992 (±0.018) 0.994 (±0.031) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.697 (±0.379) 0.767 (±0.354)
GPTo1-pre 2024/09 0.889 (±0.268) 0.911 (±0.256) 0.992 (±0.020) 0.658 (±0.373) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.742 (±0.365) 0.812 (±0.329)
OpenCoder-8B 0.557 (±0.454) 0.746 (±0.405) 0.817 (±0.354) 0.167 (±0.285) 0.622 (±0.482) 0.737 (±0.437) 0.400 (±0.422) 0.459 (±0.422) 0.509 (±0.417)
Phi-3.5-mini 0.481 (±0.432) 0.608 (±0.412) 0.639 (±0.390) 0.176 (±0.297) 0.637 (±0.466) 0.683 (±0.450) 0.450 (±0.381) 0.309 (±0.367) 0.350 (±0.371)
Phi-3.5-MoE 0.738 (±0.359) 0.831 (±0.296) 0.841 (±0.287) 0.517 (±0.421) 0.808 (±0.391) 0.932 (±0.238) 0.688 (±0.185) 0.637 (±0.394) 0.648 (±0.389)
Phi-3.0-medium-128k 0.561 (±0.439) 0.838 (±0.318) 0.886 (±0.257) 0.248 (±0.364) 0.547 (±0.475) 0.603 (±0.466) 0.625 (±0.316) 0.360 (±0.412) 0.385 (±0.417)
Phi-3.0-mini-128k 0.431 (±0.415) 0.582 (±0.424) 0.660 (±0.388) 0.263 (±0.333) 0.486 (±0.479) 0.549 (±0.480) 0.428 (±0.336) 0.231 (±0.291) 0.245 (±0.292)
Phi-3.0-small-128k 0.374 (±0.413) 0.346 (±0.394) 0.432 (±0.385) 0.284 (±0.364) 0.366 (±0.427) 0.394 (±0.429) 0.593 (±0.487) 0.278 (±0.352) 0.300 (±0.356)
Qwen-2.0-0.5B 0.038 (±0.138) 0.068 (±0.159) 0.076 (±0.171) 0.085 (±0.205) 0.005 (±0.071) 0.012 (±0.111) 0.040 (±0.136) 0.006 (±0.072) 0.010 (±0.080)
Qwen-2.0-1.5B 0.189 (±0.332) 0.126 (±0.294) 0.145 (±0.314) 0.222 (±0.349) 0.293 (±0.448) 0.351 (±0.465) 0.154 (±0.196) 0.105 (±0.177) 0.115 (±0.182)
Qwen-2.5-0.5B 0.097 (±0.244) 0.053 (±0.153) 0.101 (±0.235) 0.083 (±0.199) 0.157 (±0.360) 0.185 (±0.384) 0.071 (±0.179) 0.061 (±0.128) 0.064 (±0.129)
Qwen-2.5-14B 0.763 (±0.390) 0.781 (±0.393) 0.922 (±0.245) 0.331 (±0.432) 0.897 (±0.303) 0.910 (±0.286) 0.933 (±0.240) 0.658 (±0.378) 0.671 (±0.374)
Qwen-2.5-1.5B 0.362 (±0.428) 0.470 (±0.459) 0.584 (±0.452) 0.266 (±0.339) 0.494 (±0.485) 0.527 (±0.482) 0.127 (±0.268) 0.186 (±0.272) 0.244 (±0.326)
Qwen-2.5-32B 0.826 (±0.341) 0.979 (±0.030) 0.982 (±0.028) 0.603 (±0.471) 0.992 (±0.080) 1.000 (±0.000) 0.800 (±0.400) 0.603 (±0.391) 0.651 (±0.388)
Qwen-2.5-3B 0.603 (±0.441) 0.718 (±0.410) 0.857 (±0.292) 0.374 (±0.434) 0.733 (±0.431) 0.803 (±0.384) 0.453 (±0.451) 0.407 (±0.395) 0.479 (±0.394)
Qwen-2.5-72B 0.877 (±0.300) 0.871 (±0.317) 0.987 (±0.025) 0.614 (±0.471) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.731 (±0.369) 0.811 (±0.329)
Qwen-2.0-57B-A14B 0.673 (±0.424) 0.741 (±0.399) 0.932 (±0.173) 0.222 (±0.370) 0.860 (±0.345) 0.895 (±0.307) 0.630 (±0.438) 0.510 (±0.399) 0.599 (±0.393)
Qwen-2.5-7B 0.739 (±0.394) 0.966 (±0.139) 0.973 (±0.119) 0.329 (±0.411) 0.917 (±0.258) 0.976 (±0.136) 0.586 (±0.459) 0.565 (±0.397) 0.603 (±0.391)
Qwen-2.5-Coder-32B 0.881 (±0.297) 0.937 (±0.219) 0.991 (±0.017) 0.478 (±0.476) 1.000 (±0.000) 1.000 (±0.000) 1.000 (±0.000) 0.814 (±0.325) 0.830 (±0.314)
Qwen-2.0-72B 0.818 (±0.338) 0.964 (±0.040) 0.971 (±0.038) 0.339 (±0.426) 0.950 (±0.199) 1.000 (±0.000) 1.000 (±0.000) 0.630 (±0.380) 0.688 (±0.365)
Qwen-2.0-7B 0.551 (±0.452) 0.566 (±0.452) 0.739 (±0.405) 0.232 (±0.305) 0.799 (±0.397) 0.836 (±0.365) 0.573 (±0.476) 0.298 (±0.359) 0.369 (±0.389)
Qwen-3-235B 0.919 (±0.246) 0.912 (±0.260) 0.993 (±0.017) 0.980 (±0.139) 0.960 (±0.196) 0.993 (±0.086) 1.000 (±0.000) 0.704 (±0.396) 0.813 (±0.337)

Main Scores for all Benchmark Tasks

This table provides a detailed breakdown showing the average main score for each benchmark task for each model.

Model RdfConnectionExplainStatic-jsonld (listTrimF1) RdfConnectionExplainStatic-nt (listTrimF1) RdfConnectionExplainStatic-turtle (listTrimF1) RdfConnectionExplainStatic-xml (listTrimF1) RdfFriendCount-jsonld-1 (f1) RdfFriendCount-jsonld-2 (f1) RdfFriendCount-nt-1 (f1) RdfFriendCount-nt-2 (f1) RdfFriendCount-turtle-1 (f1) RdfFriendCount-turtle-2 (f1) RdfFriendCount-xml-1 (f1) RdfFriendCount-xml-2 (f1) RdfSyntaxFixList-jsonld (max_combined) RdfSyntaxFixList-nt (max_combined) RdfSyntaxFixList-turtle (max_combined) Sparql2AnswerListOrga-jsonld (combinedF1) Sparql2AnswerListOrga-turtle (combinedF1) SparqlSyntaxFixingListLcQuad (max_combined) Text2AnswerListOrga-jsonld (combinedF1) Text2AnswerListOrga-turtle (combinedF1) Text2SparqlExecEvalListBeastiary-turtle-schema (max_combined) Text2SparqlExecEvalListBeastiary-turtle-subgraph (max_combined) Text2SparqlExecEvalListBeastiary-turtle-subschema (max_combined) Text2SparqlExecEvalListCoypuMini (max_combined) Text2SparqlExecEvalListOrgaNumerical (max_combined) Text2SparqlExecEvalListOrganizational (max_combined)
Claude 3.5 Haiku 1.000 0.903 0.889 0.995 0.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.979 0.997 0.975 1.000 1.000 1.000 0.938 0.850 0.373 0.947 0.307 0.600 1.000 1.000
Claude 3.5 Sonnet 1.000 0.972 1.000 1.000 1.000 1.000 0.100 0.300 0.950 1.000 0.550 1.000 0.989 0.997 0.984 1.000 1.000 1.000 1.000 1.000 0.411 0.947 0.427 0.520 0.960 1.000
Claude Opus 4.6 0.958 1.000 1.000 1.000 0.095 0.095 0.219 0.204 1.000 1.000 0.149 0.103 0.989 0.999 1.000 1.000 1.000 1.000 1.000 1.000 0.787 0.947 0.667 0.600 0.960 0.840
Claude Sonnet 4.6 0.456 0.548 0.456 0.893 0.092 0.092 0.052 0.048 0.091 0.091 0.092 0.092 0.989 0.997 1.000 1.000 1.000 1.000 1.000 1.000 0.709 0.947 0.747 0.600 0.840 0.800
Deepseek-Chat-v3 0.989 0.856 0.896 0.980 0.293 0.396 0.250 0.309 0.544 0.394 0.514 0.534 0.989 0.993 0.991 0.993 0.922 1.000 0.905 0.713 0.456 0.947 0.408 0.520 0.680 0.980
Deepseek-Coder-33B 0.529 0.241 0.438 0.447 0.204 0.334 0.111 0.172 0.068 0.083 0.349 0.382 0.976 0.829 0.841 0.224 0.313 0.968 0.228 0.221 0.337 0.915 0.399 0.575 0.571 0.752
Deepseek-R1 0.960 0.997 1.000 0.980 1.000 1.000 1.000 1.000 0.980 1.000 1.000 1.000 0.989 0.997 0.987 0.993 1.000 1.000 0.970 0.985 0.443 0.952 0.349 0.520 0.872 0.984
GPT3.5 2024/01 0.901 0.801 0.848 0.813 0.200 0.550 0.000 0.000 0.000 0.000 0.550 0.900 0.989 0.998 0.997 0.500 0.696 1.000 0.506 0.850 0.387 0.947 0.261 0.520 0.600 0.760
GPT4o 2024/11 0.850 0.598 0.867 0.845 1.000 1.000 0.000 0.100 1.000 1.000 0.000 1.000 0.989 0.997 0.972 0.950 0.881 1.000 0.950 0.950 0.307 0.947 0.307 0.520 1.000 1.000
GPT4o-mini 2024/07 0.833 0.723 0.828 0.753 0.100 0.000 0.000 0.150 0.200 0.250 1.000 1.000 0.979 0.995 0.977 0.758 0.962 0.920 0.775 0.950 0.328 0.947 0.255 0.520 0.640 1.000
GPT5.2-chat 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 0.997 0.999 1.000 1.000 1.000 1.000 1.000 0.587 0.947 0.627 0.560 1.000 1.000
GPT5.4 2026/03 1.000 0.975 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 0.997 1.000 1.000 1.000 1.000 1.000 1.000 0.587 0.947 0.507 0.520 0.680 0.600
GPTo1-mini 2024/09 0.983 0.975 1.000 0.983 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 0.997 0.990 0.975 1.000 1.000 1.000 0.975 0.387 0.947 0.347 0.520 0.680 0.920
GPTo1-pre 2024/09 0.914 0.827 0.925 0.936 0.502 0.498 0.363 0.488 0.532 0.592 0.612 0.780 0.989 0.995 0.992 1.000 1.000 1.000 1.000 1.000 0.467 0.888 0.347 0.720 0.760 0.880
Gemini 1.5 Flash 1.000 0.976 1.000 1.000 0.000 1.000 0.950 0.850 1.000 1.000 0.050 0.950 0.973 1.000 0.977 1.000 1.000 0.820 1.000 1.000 0.547 0.960 0.234 0.520 0.920 1.000
Gemini 1.5 Pro 1.000 0.964 1.000 1.000 1.000 1.000 0.000 0.000 1.000 1.000 1.000 1.000 0.919 1.000 0.978 1.000 1.000 0.810 1.000 1.000 0.467 0.973 0.305 0.560 1.000 1.000
Gemini 2.0 Flash Exp 1.000 0.642 1.000 0.914 1.000 1.000 0.900 0.850 1.000 1.000 1.000 1.000 0.989 1.000 0.975 1.000 1.000 1.000 1.000 1.000 0.307 0.947 0.280 0.520 0.320 0.840
Gemini 3 Flash Preview 1.000 0.858 1.000 0.875 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 0.996 0.998 1.000 1.000 1.000 1.000 1.000 0.467 1.000 0.461 0.680 0.840 0.880
Llama-3.0-70B 0.990 0.770 0.997 0.970 0.500 0.980 0.020 0.000 0.000 0.000 0.060 0.660 0.987 1.000 0.936 1.000 1.000 0.980 1.000 1.000 0.352 0.947 0.233 0.537 0.568 0.872
Llama-3.0-8B 0.353 0.556 0.700 0.585 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.723 0.427 0.744 0.663 0.615 0.076 0.660 0.770 0.217 0.594 0.157 0.511 0.276 0.399
Llama-3.1-70B 0.990 0.834 0.993 0.988 0.060 0.060 0.860 0.780 0.020 0.000 1.000 0.980 0.979 1.000 0.938 0.902 1.000 1.000 1.000 0.960 0.417 0.947 0.261 0.516 0.808 0.744
Llama-3.1-8B 0.756 0.593 0.694 0.617 0.240 0.340 0.640 0.740 0.000 0.000 0.440 0.780 0.979 0.900 0.864 0.638 0.535 0.476 0.464 0.518 0.229 0.561 0.156 0.393 0.243 0.224
Llama-3.2-1B 0.382 0.142 0.371 0.442 0.040 0.040 0.060 0.000 0.054 0.059 0.060 0.020 0.644 0.410 0.180 0.016 0.021 0.032 0.054 0.051 0.004 0.039 0.025 0.016 0.040 0.012
Llama-3.2-3B 0.705 0.762 0.685 0.732 0.020 0.020 0.020 0.040 0.180 0.280 0.280 0.380 0.891 0.885 0.542 0.308 0.308 0.094 0.217 0.205 0.095 0.210 0.069 0.178 0.116 0.346
Llama-3.3-70B 1.000 0.951 1.000 1.000 1.000 1.000 0.000 0.000 0.000 0.000 1.000 1.000 0.987 0.995 0.953 1.000 1.000 1.000 1.000 1.000 0.451 0.947 0.392 0.536 0.520 0.680
Llama-4-Maverick 1.000 0.911 1.000 0.997 1.000 1.000 0.184 0.202 0.141 0.111 0.600 0.600 0.970 0.993 0.960 0.865 0.910 1.000 0.790 0.729 0.401 0.947 0.237 0.696 0.824 0.792
OpenCoder-8B 0.214 0.062 0.177 0.202 0.169 0.321 0.086 0.050 0.193 0.199 0.036 0.082 0.914 0.683 0.852 0.267 0.400 0.668 0.307 0.340 0.313 0.710 0.282 0.221 0.440 0.667
Phi-3.0-medium-128k 0.320 0.652 0.645 0.766 0.000 0.025 0.024 0.047 0.000 0.000 0.000 0.000 0.904 0.843 0.911 0.520 0.625 0.556 0.472 0.581 0.279 0.710 0.204 0.082 0.167 0.579
Phi-3.0-mini-128k 0.635 0.416 0.709 0.665 0.040 0.140 0.000 0.000 0.020 0.000 0.000 0.000 0.899 0.503 0.579 0.505 0.428 0.367 0.642 0.557 0.205 0.365 0.220 0.037 0.188 0.392
Phi-3.0-small-128k 0.837 0.618 0.645 0.731 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.000 0.353 0.405 0.537 0.487 0.593 0.136 0.626 0.551 0.276 0.530 0.290 0.227 0.152 0.291
Phi-3.5-MoE 0.580 0.529 0.661 0.835 0.903 0.820 0.466 0.312 0.050 0.015 0.020 0.000 0.987 0.622 0.915 0.685 0.688 0.887 0.752 0.702 0.361 0.867 0.313 0.524 0.348 0.855
Phi-3.5-mini 0.027 0.301 0.525 0.768 0.120 0.000 0.020 0.000 0.000 0.000 1.000 1.000 0.935 0.392 0.588 0.456 0.450 0.576 0.405 0.435 0.202 0.609 0.277 0.227 0.244 0.320
Qwen-2.0-0.5B 0.148 0.040 0.079 0.119 0.117 0.146 0.040 0.000 0.102 0.055 0.060 0.160 0.035 0.161 0.031 0.050 0.040 0.000 0.041 0.054 0.016 0.037 0.020 0.000 0.004 0.000
Qwen-2.0-1.5B 0.134 0.248 0.390 0.440 0.453 0.393 0.000 0.000 0.100 0.060 0.000 0.020 0.301 0.039 0.094 0.152 0.154 0.256 0.192 0.159 0.067 0.199 0.121 0.067 0.101 0.092
Qwen-2.0-57B-A14B 0.710 0.017 0.727 0.769 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.987 0.919 0.889 0.395 0.630 0.800 0.830 1.000 0.328 0.698 0.192 0.528 0.360 0.808
Qwen-2.0-72B 0.798 0.862 0.745 0.987 0.000 0.000 0.000 0.000 0.000 0.000 0.340 0.980 0.974 0.990 0.950 1.000 1.000 1.000 1.000 1.000 0.391 0.866 0.428 0.493 0.395 1.000
Qwen-2.0-7B 0.631 0.512 0.564 0.591 0.000 0.020 0.000 0.000 0.000 0.000 0.000 0.000 0.749 0.737 0.730 0.485 0.573 0.928 0.790 0.917 0.324 0.550 0.309 0.320 0.175 0.432
Qwen-2.5-0.5B 0.165 0.109 0.243 0.307 0.004 0.000 0.000 0.000 0.000 0.000 0.100 0.420 0.097 0.060 0.146 0.047 0.071 0.088 0.088 0.053 0.012 0.053 0.079 0.037 0.072 0.096
Qwen-2.5-1.5B 0.592 0.646 0.601 0.565 0.020 0.020 0.100 0.120 0.000 0.000 0.540 0.620 0.784 0.467 0.502 0.153 0.127 0.424 0.140 0.153 0.217 0.272 0.214 0.211 0.185 0.311
Qwen-2.5-14B 0.887 0.662 1.000 0.763 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.962 0.824 0.980 1.000 0.933 0.820 0.960 1.000 0.333 0.866 0.315 0.520 0.584 0.712
Qwen-2.5-32B 1.000 0.326 1.000 0.982 0.000 0.000 1.000 1.000 0.340 0.380 0.000 0.000 0.987 0.993 0.966 1.000 0.800 1.000 1.000 0.960 0.307 0.947 0.307 0.520 0.568 0.568
Qwen-2.5-3B 0.904 0.892 0.889 0.735 0.000 0.000 0.160 0.160 0.000 0.000 0.040 0.820 0.986 0.865 0.721 0.433 0.453 0.691 0.677 0.787 0.327 0.649 0.241 0.494 0.224 0.548
Qwen-2.5-72B 1.000 0.797 1.000 0.966 0.120 0.260 1.000 1.000 0.000 0.000 1.000 1.000 0.987 0.998 0.976 1.000 1.000 1.000 0.800 1.000 0.333 0.947 0.333 0.520 0.840 0.936
Qwen-2.5-7B 0.887 0.719 0.769 0.891 0.020 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.962 1.000 0.957 0.360 0.586 0.952 0.520 0.800 0.288 0.915 0.353 0.520 0.376 0.600
Qwen-2.5-Coder-32B 1.000 0.580 1.000 0.998 0.020 0.020 0.100 0.120 0.420 0.520 0.000 0.000 0.987 0.996 0.990 1.000 1.000 1.000 1.000 1.000 0.333 0.934 0.401 0.531 0.872 0.984
Qwen-3-235B 0.979 0.951 1.000 0.930 1.000 1.000 0.980 1.000 0.980 0.980 1.000 1.000 0.989 0.997 0.994 1.000 1.000 1.000 0.983 1.000 0.383 0.968 0.404 0.536 0.836 0.912
Qwen-3.5-397B 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 0.997 0.995 1.000 1.000 1.000 1.000 1.000 0.613 1.000 0.440 0.840 0.920 1.000

Capability Compass Plots

Click on any of the plots below to view a larger version showing the capability compass for each model. The dimensions are explained briefly at the top of the page

Detailed Data and File Downloads

All data is available as CSV and html files: