R-Syn-1 R-Syn-Max R-Sem S-Syn-1 S-Syn-Max S-Sem-R S-Sem-W-1 S-Sem-W-max total
mean SD mean SD mean SD mean SD mean SD mean SD mean SD mean SD mean SD
Qwen-3.5-397B 0.941436 0.215419 0.993962 0.014973 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.94 0.210713 0.94 0.210713 0.976925 0.133152
Claude Opus 4.6 0.933781 0.233898 0.996224 0.013346 0.657143 0.41493 1.0 0.0 1.0 0.0 1.0 0.0 0.746667 0.362461 0.836667 0.311252 0.896310 0.269746
Claude Sonnet 4.6 0.933781 0.233898 0.995558 0.013473 0.281957 0.30007 1.0 0.0 1.0 0.0 1.0 0.0 0.746667 0.362461 0.796667 0.337951 0.844329 0.320578
Gemini 3 Flash Preview 0.960841 0.168424 0.994582 0.014076 0.973333 0.061101 1.0 0.0 1.0 0.0 1.0 0.0 0.76 0.366606 0.85 0.31225 0.942345 0.199993
GPT5.2-chat 0.961741 0.168209 0.995292 0.01354 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.806667 0.33193 0.876667 0.276305 0.955046 0.177802
GPT5.4 2026/03 0.933781 0.233898 0.995558 0.013473 0.9975 0.020259 1.0 0.0 1.0 0.0 1.0 0.0 0.646667 0.388215 0.686667 0.381284 0.907522 0.252588
Claude 3.5 Haiku 0.93721 0.203338 0.983543 0.029142 0.778712 0.394738 1.0 0.0 1.0 0.0 1.0 0.0 0.816667 0.325491 0.886667 0.265916 0.925350 0.231767
Claude 3.5 Sonnet 0.950247 0.175128 0.990149 0.022134 0.832222 0.370475 0.98 0.1249 1.0 0.0 1.0 0.0 0.856667 0.294976 0.856667 0.294976 0.933244 0.222090
Deepseek-Coder-33B 0.773413 0.366108 0.881737 0.269353 0.262682 0.297163 0.943093 0.221062 0.984 0.112 0.313351 0.308504 0.688576 0.378916 0.703208 0.373642 0.693757 0.395823
Deepseek-R1 0.954725 0.173756 0.991158 0.020223 0.991667 0.089365 0.935 0.246526 1.0 0.0 1.0 0.0 0.746 0.382471 0.832 0.315873 0.931319 0.225512
Deepseek-Chat-v3 0.843453 0.347062 0.991104 0.01967 0.590542 0.465967 0.9575 0.201727 0.9975 0.049937 0.9225 0.214413 0.701667 0.384126 0.781667 0.347687 0.848242 0.325586
Gemini 1.5 Flash 0.919857 0.242438 0.983318 0.028443 0.877639 0.324752 0.865 0.32369 0.91 0.271846 1.0 0.0 0.85 0.30359 0.85 0.30359 0.906977 0.262638
Gemini 1.5 Pro 0.887076 0.290575 0.965816 0.12682 0.796389 0.399113 0.845 0.33908 0.905 0.286313 1.0 0.0 0.883333 0.275983 0.883333 0.275983 0.895743 0.282353
Gemini 2.0 Flash Exp 0.986293 0.024905 0.988025 0.024031 0.930595 0.19705 0.99375 0.07881 1.0 0.0 1.0 0.0 0.604167 0.394172 0.656667 0.386882 0.894937 0.260464
Llama-3.1-70B 0.908222 0.233948 0.972642 0.034107 0.5586 0.48374 0.9975 0.049937 0.9975 0.049937 1.0 0.0 0.693667 0.38094 0.753667 0.360937 0.860225 0.310749
Llama-3.1-8B 0.779064 0.374704 0.914539 0.228138 0.462011 0.421359 0.400708 0.475122 0.521437 0.476593 0.535333 0.376798 0.272973 0.401317 0.355289 0.425064 0.530169 0.452012
Llama-3.2-1B 0.250009 0.365578 0.411382 0.409318 0.158954 0.25355 0.026 0.143262 0.078704 0.260211 0.020929 0.069591 0.009758 0.050448 0.026842 0.073054 0.122822 0.276334
Llama-3.2-3B 0.401531 0.452038 0.77255 0.33172 0.344343 0.397497 0.196258 0.373874 0.321971 0.443938 0.308333 0.373482 0.119937 0.256252 0.212424 0.307736 0.334668 0.416303
Llama-3.3-70B 0.975313 0.031782 0.977974 0.029019 0.595056 0.487115 0.985 0.121552 1.0 0.0 1.0 0.0 0.616667 0.39826 0.670667 0.384571 0.852585 0.317661
Llama-3.0-70B 0.960903 0.114356 0.974309 0.033036 0.522713 0.480339 0.954592 0.208198 0.99 0.099499 1.0 0.0 0.644857 0.405357 0.73102 0.367419 0.847299 0.324334
Llama-3.0-8B 0.585978 0.425891 0.631637 0.416095 0.219472 0.290386 0.271458 0.444712 0.4255 0.487903 0.615 0.337182 0.280588 0.396538 0.445018 0.416722 0.434332 0.434503
Llama-4-Maverick 0.870151 0.241133 0.974361 0.032998 0.654616 0.464627 0.96 0.195959 1.0 0.0 0.91 0.243721 0.686667 0.381284 0.814667 0.326814 0.858808 0.305092
GPT3.5 2024/01 0.975164 0.126267 0.994782 0.014293 0.411371 0.442473 0.94375 0.230404 1.0 0.0 0.695833 0.373864 0.674167 0.387154 0.706667 0.37618 0.800217 0.355652
GPT4o 2024/11 0.937102 0.211703 0.986071 0.024065 0.725916 0.376724 1.0 0.0 1.0 0.0 0.88125 0.183179 0.816667 0.325491 0.866667 0.285968 0.901709 0.243944
GPT4o-mini 2024/07 0.919209 0.231609 0.983495 0.029735 0.383737 0.415336 0.92125 0.246066 0.96 0.174356 0.9625 0.089268 0.709167 0.38532 0.776667 0.348823 0.827003 0.332965
GPTo1-mini 2024/09 0.835201 0.350556 0.991999 0.018265 0.994167 0.03063 1.0 0.0 1.0 0.0 1.0 0.0 0.696667 0.378873 0.766667 0.35371 0.910587 0.250642
GPTo1-pre 2024/09 0.91083 0.256424 0.99199 0.019549 0.657768 0.372691 1.0 0.0 1.0 0.0 1.0 0.0 0.742 0.364997 0.812 0.329306 0.889323 0.268270
OpenCoder-8B 0.746286 0.405319 0.816587 0.353531 0.167271 0.284796 0.6215 0.481703 0.7365 0.436884 0.400417 0.422322 0.45944 0.422322 0.50944 0.416867 0.557180 0.454361
Phi-3.5-mini 0.608426 0.411663 0.6386 0.390456 0.176046 0.29694 0.636636 0.465758 0.683 0.449568 0.45 0.380789 0.308727 0.367088 0.35 0.370502 0.481429 0.432010
Phi-3.5-MoE 0.830644 0.295706 0.841362 0.287488 0.517066 0.420942 0.80796 0.391414 0.931801 0.238311 0.688333 0.185075 0.636693 0.393948 0.648375 0.388671 0.737779 0.358659
Phi-3.0-medium-128k 0.838169 0.318021 0.886107 0.256586 0.247843 0.364315 0.546613 0.474792 0.602775 0.465873 0.625 0.316228 0.36034 0.41186 0.384596 0.417376 0.561430 0.439319
Phi-3.0-mini-128k 0.582301 0.423917 0.660334 0.388437 0.262528 0.333355 0.48579 0.478537 0.548806 0.479705 0.428333 0.335725 0.230937 0.291482 0.245288 0.291829 0.430540 0.415171
Phi-3.0-small-128k 0.346005 0.394207 0.431548 0.385374 0.283737 0.363601 0.365827 0.42708 0.39448 0.429494 0.593333 0.486667 0.278242 0.352309 0.29994 0.356429 0.374139 0.413292
Qwen-2.0-0.5B 0.067583 0.159254 0.075625 0.171089 0.084666 0.205279 0.005 0.070534 0.0125 0.111102 0.04 0.135647 0.006 0.071861 0.010143 0.080097 0.037690 0.137878
Qwen-2.0-1.5B 0.126225 0.293742 0.145075 0.313676 0.221703 0.348686 0.293344 0.448211 0.351414 0.464674 0.153798 0.195769 0.104679 0.177092 0.114628 0.182112 0.188858 0.332418
Qwen-2.5-0.5B 0.05289 0.152977 0.101304 0.234616 0.082752 0.199207 0.157469 0.359823 0.185122 0.384255 0.070833 0.179167 0.061388 0.128488 0.064449 0.129368 0.097026 0.244234
Qwen-2.5-14B 0.780775 0.393393 0.921856 0.245177 0.33114 0.431594 0.8975 0.303305 0.91 0.286182 0.933333 0.24037 0.657516 0.377599 0.670516 0.374055 0.762830 0.389833
Qwen-2.5-1.5B 0.470365 0.458727 0.584223 0.452018 0.266381 0.338945 0.494005 0.485349 0.526566 0.482228 0.126667 0.267831 0.186241 0.271857 0.244461 0.325867 0.362364 0.427848
Qwen-2.5-32B 0.978802 0.030021 0.981681 0.028171 0.60281 0.471244 0.992 0.079599 1.0 0.0 0.8 0.4 0.602667 0.391001 0.650667 0.387714 0.826078 0.340924
Qwen-2.5-3B 0.717887 0.410167 0.857021 0.292325 0.373996 0.434494 0.732808 0.430639 0.802647 0.383801 0.453333 0.450974 0.406987 0.39465 0.478783 0.393808 0.602933 0.440535
Qwen-2.5-72B 0.870546 0.317392 0.986602 0.024704 0.614331 0.47061 1.0 0.0 1.0 0.0 1.0 0.0 0.730667 0.368531 0.810667 0.329406 0.876602 0.299874
Qwen-2.0-57B-A14B 0.740664 0.399291 0.931611 0.173086 0.22234 0.369943 0.8595 0.345195 0.895 0.306553 0.63 0.437531 0.509576 0.39914 0.598525 0.393253 0.673402 0.423895
Qwen-2.5-7B 0.966325 0.13879 0.973107 0.118553 0.328656 0.410841 0.917 0.257897 0.976 0.13647 0.585833 0.459348 0.564667 0.39704 0.602667 0.391001 0.739282 0.394323
Qwen-2.5-Coder-32B 0.936955 0.21861 0.990675 0.017472 0.477734 0.476175 1.0 0.0 1.0 0.0 1.0 0.0 0.814061 0.325392 0.830061 0.314283 0.881186 0.297244
Qwen-2.0-72B 0.964433 0.040492 0.970993 0.037734 0.339131 0.426009 0.95 0.198746 1.0 0.0 1.0 0.0 0.62978 0.380062 0.688498 0.365379 0.817854 0.337645
Qwen-2.0-7B 0.565874 0.451627 0.738517 0.405185 0.231832 0.304521 0.799 0.396735 0.8365 0.365469 0.573333 0.476282 0.297546 0.359016 0.369273 0.389436 0.551484 0.452392
Qwen-3-235B 0.911814 0.26048 0.993491 0.017426 0.979971 0.138967 0.959796 0.196438 0.9925 0.086277 1.0 0.0 0.704 0.396128 0.813 0.336696 0.919322 0.246014