LLM Benchmark Results

Combined Scores Overview

Interactive table showing model combined scores (mean (+- standard deviation)) across several capabilities. Click column headers to sort, or use the search box to filter.

R-Syn-1: RDF-Syntax-Capability on first answer

R-Syn-max: RDF-Syntax-Capability on best answer

R-Sem: RDF-Semantics-Capability based on tasks RdfConnectionExplainStatic and RdfFriendCount

S-Syn-1: SPARQL-Syntax-Capability on first answer

S-Syn-max: SPARQL-Syntax-Capability on best answer

S-Sem-R: SPARQL-Semantic-Read-Capability

S-Sem-W-1: SPARQL-Semantic-Write-Capability on first answer

S-Sem-W-max: SPARQL-Semantic-Write-Capability on best answer

Model	total	R-Syn-1	R-Syn-Max	R-Sem	S-Syn-1	S-Syn-Max	S-Sem-R	S-Sem-W-1	S-Sem-W-max
Qwen-3.5-397B	0.977 (±0.133)	0.941 (±0.215)	0.994 (±0.015)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.940 (±0.211)	0.940 (±0.211)
Claude Opus 4.6	0.896 (±0.270)	0.934 (±0.234)	0.996 (±0.013)	0.657 (±0.415)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.747 (±0.362)	0.837 (±0.311)
Claude Sonnet 4.6	0.844 (±0.321)	0.934 (±0.234)	0.996 (±0.013)	0.282 (±0.300)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.747 (±0.362)	0.797 (±0.338)
Gemini 3 Flash Preview	0.942 (±0.200)	0.961 (±0.168)	0.995 (±0.014)	0.973 (±0.061)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.760 (±0.367)	0.850 (±0.312)
GPT5.2-chat	0.955 (±0.178)	0.962 (±0.168)	0.995 (±0.014)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.807 (±0.332)	0.877 (±0.276)
GPT5.4 2026/03	0.908 (±0.253)	0.934 (±0.234)	0.996 (±0.013)	0.998 (±0.020)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.647 (±0.388)	0.687 (±0.381)
Claude 3.5 Haiku	0.925 (±0.232)	0.937 (±0.203)	0.984 (±0.029)	0.779 (±0.395)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.817 (±0.325)	0.887 (±0.266)
Claude 3.5 Sonnet	0.933 (±0.222)	0.950 (±0.175)	0.990 (±0.022)	0.832 (±0.370)	0.980 (±0.125)	1.000 (±0.000)	1.000 (±0.000)	0.857 (±0.295)	0.857 (±0.295)
Deepseek-Coder-33B	0.694 (±0.396)	0.773 (±0.366)	0.882 (±0.269)	0.263 (±0.297)	0.943 (±0.221)	0.984 (±0.112)	0.313 (±0.309)	0.689 (±0.379)	0.703 (±0.374)
Deepseek-R1	0.931 (±0.226)	0.955 (±0.174)	0.991 (±0.020)	0.992 (±0.089)	0.935 (±0.247)	1.000 (±0.000)	1.000 (±0.000)	0.746 (±0.382)	0.832 (±0.316)
Deepseek-Chat-v3	0.848 (±0.326)	0.843 (±0.347)	0.991 (±0.020)	0.591 (±0.466)	0.957 (±0.202)	0.997 (±0.050)	0.923 (±0.214)	0.702 (±0.384)	0.782 (±0.348)
Gemini 1.5 Flash	0.907 (±0.263)	0.920 (±0.242)	0.983 (±0.028)	0.878 (±0.325)	0.865 (±0.324)	0.910 (±0.272)	1.000 (±0.000)	0.850 (±0.304)	0.850 (±0.304)
Gemini 1.5 Pro	0.896 (±0.282)	0.887 (±0.291)	0.966 (±0.127)	0.796 (±0.399)	0.845 (±0.339)	0.905 (±0.286)	1.000 (±0.000)	0.883 (±0.276)	0.883 (±0.276)
Gemini 2.0 Flash Exp	0.895 (±0.260)	0.986 (±0.025)	0.988 (±0.024)	0.931 (±0.197)	0.994 (±0.079)	1.000 (±0.000)	1.000 (±0.000)	0.604 (±0.394)	0.657 (±0.387)
Llama-3.1-70B	0.860 (±0.311)	0.908 (±0.234)	0.973 (±0.034)	0.559 (±0.484)	0.997 (±0.050)	0.997 (±0.050)	1.000 (±0.000)	0.694 (±0.381)	0.754 (±0.361)
Llama-3.1-8B	0.530 (±0.452)	0.779 (±0.375)	0.915 (±0.228)	0.462 (±0.421)	0.401 (±0.475)	0.521 (±0.477)	0.535 (±0.377)	0.273 (±0.401)	0.355 (±0.425)
Llama-3.2-1B	0.123 (±0.276)	0.250 (±0.366)	0.411 (±0.409)	0.159 (±0.254)	0.026 (±0.143)	0.079 (±0.260)	0.021 (±0.070)	0.010 (±0.050)	0.027 (±0.073)
Llama-3.2-3B	0.335 (±0.416)	0.402 (±0.452)	0.773 (±0.332)	0.344 (±0.397)	0.196 (±0.374)	0.322 (±0.444)	0.308 (±0.373)	0.120 (±0.256)	0.212 (±0.308)
Llama-3.3-70B	0.853 (±0.318)	0.975 (±0.032)	0.978 (±0.029)	0.595 (±0.487)	0.985 (±0.122)	1.000 (±0.000)	1.000 (±0.000)	0.617 (±0.398)	0.671 (±0.385)
Llama-3.0-70B	0.847 (±0.324)	0.961 (±0.114)	0.974 (±0.033)	0.523 (±0.480)	0.955 (±0.208)	0.990 (±0.099)	1.000 (±0.000)	0.645 (±0.405)	0.731 (±0.367)
Llama-3.0-8B	0.434 (±0.435)	0.586 (±0.426)	0.632 (±0.416)	0.219 (±0.290)	0.271 (±0.445)	0.425 (±0.488)	0.615 (±0.337)	0.281 (±0.397)	0.445 (±0.417)
Llama-4-Maverick	0.859 (±0.305)	0.870 (±0.241)	0.974 (±0.033)	0.655 (±0.465)	0.960 (±0.196)	1.000 (±0.000)	0.910 (±0.244)	0.687 (±0.381)	0.815 (±0.327)
GPT3.5 2024/01	0.800 (±0.356)	0.975 (±0.126)	0.995 (±0.014)	0.411 (±0.442)	0.944 (±0.230)	1.000 (±0.000)	0.696 (±0.374)	0.674 (±0.387)	0.707 (±0.376)
GPT4o 2024/11	0.902 (±0.244)	0.937 (±0.212)	0.986 (±0.024)	0.726 (±0.377)	1.000 (±0.000)	1.000 (±0.000)	0.881 (±0.183)	0.817 (±0.325)	0.867 (±0.286)
GPT4o-mini 2024/07	0.827 (±0.333)	0.919 (±0.232)	0.983 (±0.030)	0.384 (±0.415)	0.921 (±0.246)	0.960 (±0.174)	0.962 (±0.089)	0.709 (±0.385)	0.777 (±0.349)
GPTo1-mini 2024/09	0.911 (±0.251)	0.835 (±0.351)	0.992 (±0.018)	0.994 (±0.031)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.697 (±0.379)	0.767 (±0.354)
GPTo1-pre 2024/09	0.889 (±0.268)	0.911 (±0.256)	0.992 (±0.020)	0.658 (±0.373)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.742 (±0.365)	0.812 (±0.329)
OpenCoder-8B	0.557 (±0.454)	0.746 (±0.405)	0.817 (±0.354)	0.167 (±0.285)	0.622 (±0.482)	0.737 (±0.437)	0.400 (±0.422)	0.459 (±0.422)	0.509 (±0.417)
Phi-3.5-mini	0.481 (±0.432)	0.608 (±0.412)	0.639 (±0.390)	0.176 (±0.297)	0.637 (±0.466)	0.683 (±0.450)	0.450 (±0.381)	0.309 (±0.367)	0.350 (±0.371)
Phi-3.5-MoE	0.738 (±0.359)	0.831 (±0.296)	0.841 (±0.287)	0.517 (±0.421)	0.808 (±0.391)	0.932 (±0.238)	0.688 (±0.185)	0.637 (±0.394)	0.648 (±0.389)
Phi-3.0-medium-128k	0.561 (±0.439)	0.838 (±0.318)	0.886 (±0.257)	0.248 (±0.364)	0.547 (±0.475)	0.603 (±0.466)	0.625 (±0.316)	0.360 (±0.412)	0.385 (±0.417)
Phi-3.0-mini-128k	0.431 (±0.415)	0.582 (±0.424)	0.660 (±0.388)	0.263 (±0.333)	0.486 (±0.479)	0.549 (±0.480)	0.428 (±0.336)	0.231 (±0.291)	0.245 (±0.292)
Phi-3.0-small-128k	0.374 (±0.413)	0.346 (±0.394)	0.432 (±0.385)	0.284 (±0.364)	0.366 (±0.427)	0.394 (±0.429)	0.593 (±0.487)	0.278 (±0.352)	0.300 (±0.356)
Qwen-2.0-0.5B	0.038 (±0.138)	0.068 (±0.159)	0.076 (±0.171)	0.085 (±0.205)	0.005 (±0.071)	0.012 (±0.111)	0.040 (±0.136)	0.006 (±0.072)	0.010 (±0.080)
Qwen-2.0-1.5B	0.189 (±0.332)	0.126 (±0.294)	0.145 (±0.314)	0.222 (±0.349)	0.293 (±0.448)	0.351 (±0.465)	0.154 (±0.196)	0.105 (±0.177)	0.115 (±0.182)
Qwen-2.5-0.5B	0.097 (±0.244)	0.053 (±0.153)	0.101 (±0.235)	0.083 (±0.199)	0.157 (±0.360)	0.185 (±0.384)	0.071 (±0.179)	0.061 (±0.128)	0.064 (±0.129)
Qwen-2.5-14B	0.763 (±0.390)	0.781 (±0.393)	0.922 (±0.245)	0.331 (±0.432)	0.897 (±0.303)	0.910 (±0.286)	0.933 (±0.240)	0.658 (±0.378)	0.671 (±0.374)
Qwen-2.5-1.5B	0.362 (±0.428)	0.470 (±0.459)	0.584 (±0.452)	0.266 (±0.339)	0.494 (±0.485)	0.527 (±0.482)	0.127 (±0.268)	0.186 (±0.272)	0.244 (±0.326)
Qwen-2.5-32B	0.826 (±0.341)	0.979 (±0.030)	0.982 (±0.028)	0.603 (±0.471)	0.992 (±0.080)	1.000 (±0.000)	0.800 (±0.400)	0.603 (±0.391)	0.651 (±0.388)
Qwen-2.5-3B	0.603 (±0.441)	0.718 (±0.410)	0.857 (±0.292)	0.374 (±0.434)	0.733 (±0.431)	0.803 (±0.384)	0.453 (±0.451)	0.407 (±0.395)	0.479 (±0.394)
Qwen-2.5-72B	0.877 (±0.300)	0.871 (±0.317)	0.987 (±0.025)	0.614 (±0.471)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.731 (±0.369)	0.811 (±0.329)
Qwen-2.0-57B-A14B	0.673 (±0.424)	0.741 (±0.399)	0.932 (±0.173)	0.222 (±0.370)	0.860 (±0.345)	0.895 (±0.307)	0.630 (±0.438)	0.510 (±0.399)	0.599 (±0.393)
Qwen-2.5-7B	0.739 (±0.394)	0.966 (±0.139)	0.973 (±0.119)	0.329 (±0.411)	0.917 (±0.258)	0.976 (±0.136)	0.586 (±0.459)	0.565 (±0.397)	0.603 (±0.391)
Qwen-2.5-Coder-32B	0.881 (±0.297)	0.937 (±0.219)	0.991 (±0.017)	0.478 (±0.476)	1.000 (±0.000)	1.000 (±0.000)	1.000 (±0.000)	0.814 (±0.325)	0.830 (±0.314)
Qwen-2.0-72B	0.818 (±0.338)	0.964 (±0.040)	0.971 (±0.038)	0.339 (±0.426)	0.950 (±0.199)	1.000 (±0.000)	1.000 (±0.000)	0.630 (±0.380)	0.688 (±0.365)
Qwen-2.0-7B	0.551 (±0.452)	0.566 (±0.452)	0.739 (±0.405)	0.232 (±0.305)	0.799 (±0.397)	0.836 (±0.365)	0.573 (±0.476)	0.298 (±0.359)	0.369 (±0.389)
Qwen-3-235B	0.919 (±0.246)	0.912 (±0.260)	0.993 (±0.017)	0.980 (±0.139)	0.960 (±0.196)	0.993 (±0.086)	1.000 (±0.000)	0.704 (±0.396)	0.813 (±0.337)

Main Scores for all Benchmark Tasks

This table provides a detailed breakdown showing the average main score for each benchmark task for each model.

Model	RdfConnectionExplainStatic-jsonld (listTrimF1)	RdfConnectionExplainStatic-nt (listTrimF1)	RdfConnectionExplainStatic-turtle (listTrimF1)	RdfConnectionExplainStatic-xml (listTrimF1)	RdfFriendCount-jsonld-1 (f1)	RdfFriendCount-jsonld-2 (f1)	RdfFriendCount-nt-1 (f1)	RdfFriendCount-nt-2 (f1)	RdfFriendCount-turtle-1 (f1)	RdfFriendCount-turtle-2 (f1)	RdfFriendCount-xml-1 (f1)	RdfFriendCount-xml-2 (f1)	RdfSyntaxFixList-jsonld (max_combined)	RdfSyntaxFixList-nt (max_combined)	RdfSyntaxFixList-turtle (max_combined)	Sparql2AnswerListOrga-jsonld (combinedF1)	Sparql2AnswerListOrga-turtle (combinedF1)	SparqlSyntaxFixingListLcQuad (max_combined)	Text2AnswerListOrga-jsonld (combinedF1)	Text2AnswerListOrga-turtle (combinedF1)	Text2SparqlExecEvalListBeastiary-turtle-schema (max_combined)	Text2SparqlExecEvalListBeastiary-turtle-subgraph (max_combined)	Text2SparqlExecEvalListBeastiary-turtle-subschema (max_combined)	Text2SparqlExecEvalListCoypuMini (max_combined)	Text2SparqlExecEvalListOrgaNumerical (max_combined)	Text2SparqlExecEvalListOrganizational (max_combined)
Claude 3.5 Haiku	1.000	0.903	0.889	0.995	0.000	0.000	1.000	1.000	1.000	1.000	1.000	1.000	0.979	0.997	0.975	1.000	1.000	1.000	0.938	0.850	0.373	0.947	0.307	0.600	1.000	1.000
Claude 3.5 Sonnet	1.000	0.972	1.000	1.000	1.000	1.000	0.100	0.300	0.950	1.000	0.550	1.000	0.989	0.997	0.984	1.000	1.000	1.000	1.000	1.000	0.411	0.947	0.427	0.520	0.960	1.000
Claude Opus 4.6	0.958	1.000	1.000	1.000	0.095	0.095	0.219	0.204	1.000	1.000	0.149	0.103	0.989	0.999	1.000	1.000	1.000	1.000	1.000	1.000	0.787	0.947	0.667	0.600	0.960	0.840
Claude Sonnet 4.6	0.456	0.548	0.456	0.893	0.092	0.092	0.052	0.048	0.091	0.091	0.092	0.092	0.989	0.997	1.000	1.000	1.000	1.000	1.000	1.000	0.709	0.947	0.747	0.600	0.840	0.800
Deepseek-Chat-v3	0.989	0.856	0.896	0.980	0.293	0.396	0.250	0.309	0.544	0.394	0.514	0.534	0.989	0.993	0.991	0.993	0.922	1.000	0.905	0.713	0.456	0.947	0.408	0.520	0.680	0.980
Deepseek-Coder-33B	0.529	0.241	0.438	0.447	0.204	0.334	0.111	0.172	0.068	0.083	0.349	0.382	0.976	0.829	0.841	0.224	0.313	0.968	0.228	0.221	0.337	0.915	0.399	0.575	0.571	0.752
Deepseek-R1	0.960	0.997	1.000	0.980	1.000	1.000	1.000	1.000	0.980	1.000	1.000	1.000	0.989	0.997	0.987	0.993	1.000	1.000	0.970	0.985	0.443	0.952	0.349	0.520	0.872	0.984
GPT3.5 2024/01	0.901	0.801	0.848	0.813	0.200	0.550	0.000	0.000	0.000	0.000	0.550	0.900	0.989	0.998	0.997	0.500	0.696	1.000	0.506	0.850	0.387	0.947	0.261	0.520	0.600	0.760
GPT4o 2024/11	0.850	0.598	0.867	0.845	1.000	1.000	0.000	0.100	1.000	1.000	0.000	1.000	0.989	0.997	0.972	0.950	0.881	1.000	0.950	0.950	0.307	0.947	0.307	0.520	1.000	1.000
GPT4o-mini 2024/07	0.833	0.723	0.828	0.753	0.100	0.000	0.000	0.150	0.200	0.250	1.000	1.000	0.979	0.995	0.977	0.758	0.962	0.920	0.775	0.950	0.328	0.947	0.255	0.520	0.640	1.000
GPT5.2-chat	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.989	0.997	0.999	1.000	1.000	1.000	1.000	1.000	0.587	0.947	0.627	0.560	1.000	1.000
GPT5.4 2026/03	1.000	0.975	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.989	0.997	1.000	1.000	1.000	1.000	1.000	1.000	0.587	0.947	0.507	0.520	0.680	0.600
GPTo1-mini 2024/09	0.983	0.975	1.000	0.983	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.989	0.997	0.990	0.975	1.000	1.000	1.000	0.975	0.387	0.947	0.347	0.520	0.680	0.920
GPTo1-pre 2024/09	0.914	0.827	0.925	0.936	0.502	0.498	0.363	0.488	0.532	0.592	0.612	0.780	0.989	0.995	0.992	1.000	1.000	1.000	1.000	1.000	0.467	0.888	0.347	0.720	0.760	0.880
Gemini 1.5 Flash	1.000	0.976	1.000	1.000	0.000	1.000	0.950	0.850	1.000	1.000	0.050	0.950	0.973	1.000	0.977	1.000	1.000	0.820	1.000	1.000	0.547	0.960	0.234	0.520	0.920	1.000
Gemini 1.5 Pro	1.000	0.964	1.000	1.000	1.000	1.000	0.000	0.000	1.000	1.000	1.000	1.000	0.919	1.000	0.978	1.000	1.000	0.810	1.000	1.000	0.467	0.973	0.305	0.560	1.000	1.000
Gemini 2.0 Flash Exp	1.000	0.642	1.000	0.914	1.000	1.000	0.900	0.850	1.000	1.000	1.000	1.000	0.989	1.000	0.975	1.000	1.000	1.000	1.000	1.000	0.307	0.947	0.280	0.520	0.320	0.840
Gemini 3 Flash Preview	1.000	0.858	1.000	0.875	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.989	0.996	0.998	1.000	1.000	1.000	1.000	1.000	0.467	1.000	0.461	0.680	0.840	0.880
Llama-3.0-70B	0.990	0.770	0.997	0.970	0.500	0.980	0.020	0.000	0.000	0.000	0.060	0.660	0.987	1.000	0.936	1.000	1.000	0.980	1.000	1.000	0.352	0.947	0.233	0.537	0.568	0.872
Llama-3.0-8B	0.353	0.556	0.700	0.585	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.723	0.427	0.744	0.663	0.615	0.076	0.660	0.770	0.217	0.594	0.157	0.511	0.276	0.399
Llama-3.1-70B	0.990	0.834	0.993	0.988	0.060	0.060	0.860	0.780	0.020	0.000	1.000	0.980	0.979	1.000	0.938	0.902	1.000	1.000	1.000	0.960	0.417	0.947	0.261	0.516	0.808	0.744
Llama-3.1-8B	0.756	0.593	0.694	0.617	0.240	0.340	0.640	0.740	0.000	0.000	0.440	0.780	0.979	0.900	0.864	0.638	0.535	0.476	0.464	0.518	0.229	0.561	0.156	0.393	0.243	0.224
Llama-3.2-1B	0.382	0.142	0.371	0.442	0.040	0.040	0.060	0.000	0.054	0.059	0.060	0.020	0.644	0.410	0.180	0.016	0.021	0.032	0.054	0.051	0.004	0.039	0.025	0.016	0.040	0.012
Llama-3.2-3B	0.705	0.762	0.685	0.732	0.020	0.020	0.020	0.040	0.180	0.280	0.280	0.380	0.891	0.885	0.542	0.308	0.308	0.094	0.217	0.205	0.095	0.210	0.069	0.178	0.116	0.346
Llama-3.3-70B	1.000	0.951	1.000	1.000	1.000	1.000	0.000	0.000	0.000	0.000	1.000	1.000	0.987	0.995	0.953	1.000	1.000	1.000	1.000	1.000	0.451	0.947	0.392	0.536	0.520	0.680
Llama-4-Maverick	1.000	0.911	1.000	0.997	1.000	1.000	0.184	0.202	0.141	0.111	0.600	0.600	0.970	0.993	0.960	0.865	0.910	1.000	0.790	0.729	0.401	0.947	0.237	0.696	0.824	0.792
OpenCoder-8B	0.214	0.062	0.177	0.202	0.169	0.321	0.086	0.050	0.193	0.199	0.036	0.082	0.914	0.683	0.852	0.267	0.400	0.668	0.307	0.340	0.313	0.710	0.282	0.221	0.440	0.667
Phi-3.0-medium-128k	0.320	0.652	0.645	0.766	0.000	0.025	0.024	0.047	0.000	0.000	0.000	0.000	0.904	0.843	0.911	0.520	0.625	0.556	0.472	0.581	0.279	0.710	0.204	0.082	0.167	0.579
Phi-3.0-mini-128k	0.635	0.416	0.709	0.665	0.040	0.140	0.000	0.000	0.020	0.000	0.000	0.000	0.899	0.503	0.579	0.505	0.428	0.367	0.642	0.557	0.205	0.365	0.220	0.037	0.188	0.392
Phi-3.0-small-128k	0.837	0.618	0.645	0.731	0.000	0.000	0.000	0.000	0.008	0.000	0.000	0.000	0.353	0.405	0.537	0.487	0.593	0.136	0.626	0.551	0.276	0.530	0.290	0.227	0.152	0.291
Phi-3.5-MoE	0.580	0.529	0.661	0.835	0.903	0.820	0.466	0.312	0.050	0.015	0.020	0.000	0.987	0.622	0.915	0.685	0.688	0.887	0.752	0.702	0.361	0.867	0.313	0.524	0.348	0.855
Phi-3.5-mini	0.027	0.301	0.525	0.768	0.120	0.000	0.020	0.000	0.000	0.000	1.000	1.000	0.935	0.392	0.588	0.456	0.450	0.576	0.405	0.435	0.202	0.609	0.277	0.227	0.244	0.320
Qwen-2.0-0.5B	0.148	0.040	0.079	0.119	0.117	0.146	0.040	0.000	0.102	0.055	0.060	0.160	0.035	0.161	0.031	0.050	0.040	0.000	0.041	0.054	0.016	0.037	0.020	0.000	0.004	0.000
Qwen-2.0-1.5B	0.134	0.248	0.390	0.440	0.453	0.393	0.000	0.000	0.100	0.060	0.000	0.020	0.301	0.039	0.094	0.152	0.154	0.256	0.192	0.159	0.067	0.199	0.121	0.067	0.101	0.092
Qwen-2.0-57B-A14B	0.710	0.017	0.727	0.769	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.987	0.919	0.889	0.395	0.630	0.800	0.830	1.000	0.328	0.698	0.192	0.528	0.360	0.808
Qwen-2.0-72B	0.798	0.862	0.745	0.987	0.000	0.000	0.000	0.000	0.000	0.000	0.340	0.980	0.974	0.990	0.950	1.000	1.000	1.000	1.000	1.000	0.391	0.866	0.428	0.493	0.395	1.000
Qwen-2.0-7B	0.631	0.512	0.564	0.591	0.000	0.020	0.000	0.000	0.000	0.000	0.000	0.000	0.749	0.737	0.730	0.485	0.573	0.928	0.790	0.917	0.324	0.550	0.309	0.320	0.175	0.432
Qwen-2.5-0.5B	0.165	0.109	0.243	0.307	0.004	0.000	0.000	0.000	0.000	0.000	0.100	0.420	0.097	0.060	0.146	0.047	0.071	0.088	0.088	0.053	0.012	0.053	0.079	0.037	0.072	0.096
Qwen-2.5-1.5B	0.592	0.646	0.601	0.565	0.020	0.020	0.100	0.120	0.000	0.000	0.540	0.620	0.784	0.467	0.502	0.153	0.127	0.424	0.140	0.153	0.217	0.272	0.214	0.211	0.185	0.311
Qwen-2.5-14B	0.887	0.662	1.000	0.763	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.962	0.824	0.980	1.000	0.933	0.820	0.960	1.000	0.333	0.866	0.315	0.520	0.584	0.712
Qwen-2.5-32B	1.000	0.326	1.000	0.982	0.000	0.000	1.000	1.000	0.340	0.380	0.000	0.000	0.987	0.993	0.966	1.000	0.800	1.000	1.000	0.960	0.307	0.947	0.307	0.520	0.568	0.568
Qwen-2.5-3B	0.904	0.892	0.889	0.735	0.000	0.000	0.160	0.160	0.000	0.000	0.040	0.820	0.986	0.865	0.721	0.433	0.453	0.691	0.677	0.787	0.327	0.649	0.241	0.494	0.224	0.548
Qwen-2.5-72B	1.000	0.797	1.000	0.966	0.120	0.260	1.000	1.000	0.000	0.000	1.000	1.000	0.987	0.998	0.976	1.000	1.000	1.000	0.800	1.000	0.333	0.947	0.333	0.520	0.840	0.936
Qwen-2.5-7B	0.887	0.719	0.769	0.891	0.020	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.962	1.000	0.957	0.360	0.586	0.952	0.520	0.800	0.288	0.915	0.353	0.520	0.376	0.600
Qwen-2.5-Coder-32B	1.000	0.580	1.000	0.998	0.020	0.020	0.100	0.120	0.420	0.520	0.000	0.000	0.987	0.996	0.990	1.000	1.000	1.000	1.000	1.000	0.333	0.934	0.401	0.531	0.872	0.984
Qwen-3-235B	0.979	0.951	1.000	0.930	1.000	1.000	0.980	1.000	0.980	0.980	1.000	1.000	0.989	0.997	0.994	1.000	1.000	1.000	0.983	1.000	0.383	0.968	0.404	0.536	0.836	0.912
Qwen-3.5-397B	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.989	0.997	0.995	1.000	1.000	1.000	1.000	1.000	0.613	1.000	0.440	0.840	0.920	1.000

Capability Compass Plots

Click on any of the plots below to view a larger version showing the capability compass for each model. The dimensions are explained briefly at the top of the page