Quantitaive Results:
1) Counting:
|
Precision ↑ |
Recall ↑ |
F1 ↑ |
|
Easy |
Medium |
Hard |
Easy |
Medium |
Hard |
Easy |
Medium |
Hard |
SDV1 |
67.19 |
68.66 |
75.97 |
77.76 |
43.8 |
35.71 |
72.09 |
53.48 |
48.58 |
SDV2 |
79.79 |
84.91 |
90.81 |
67.41 |
31.58 |
25.97 |
73.07 |
46.04 |
40.39 |
Glide |
72.52 |
73.05 |
83.87 |
54.1 |
27.32 |
19.11 |
61.97 |
39.77 |
31.13 |
CogView 2 |
68.32 |
67.03 |
96.47 |
63.32 |
1.22 |
0.92 |
65.73 |
2.39 |
1.82 |
DALL.E V2 |
81.71 |
83.88 |
98.28 |
82 |
1.52 |
0.85 |
81.85 |
2.99 |
1.7 |
Paella |
73.93 |
70.21 |
77.66 |
69.12 |
31.27 |
23.16 |
71.44 |
43.27 |
35.68 |
minDALL-E |
76.89 |
79.71 |
89.05 |
48.33 |
20.98 |
14.05 |
59.35 |
33.21 |
24.27 |
DALL-E_Mini |
76.98 |
86.75 |
96.66 |
78.32 |
1.22 |
0.84 |
77.63 |
2.41 |
1.67 |
2) Counting Ablation:
|
Precision ↑ |
Recall ↑ |
F1 ↑ |
|
Vanilla |
Meta |
Detailed |
Vanilla |
Meta |
Detailed |
Vanilla |
Meta |
Detailed |
SDV1 |
75.81 |
69.14 |
70.61 |
46.48 |
46.7 |
52.42 |
55.21 |
52.79 |
58.05 |
SDV2 |
78.27 |
75.78 |
85.17 |
36.02 |
39.25 |
41.65 |
46.04 |
46.77 |
53.16 |
Glide |
87.91 |
80.79 |
76.48 |
30.19 |
28.27 |
33.51 |
41.96 |
38.72 |
44.29 |
CogView 2 |
85.22 |
85.89 |
79.11 |
19.23 |
19.8 |
21.9 |
22.3 |
22.13 |
23.03 |
DALL.E V2 |
92.53 |
90.29 |
87.96 |
28.7 |
27.93 |
28.12 |
29.99 |
29.06 |
28.85 |
Paella |
80.52 |
72.53 |
73.93 |
38.26 |
39.41 |
41.19 |
49.82 |
47.98 |
50.13 |
minDALL-E |
86.09 |
87.7 |
81.88 |
19.51 |
16.86 |
27.78 |
29.22 |
26.59 |
38.94 |
DALL-E_Mini |
89.21 |
87.19 |
86.8 |
24.32 |
23.93 |
26.8 |
26.71 |
25.94 |
27.24 |
3) Visual-Text:
|
NED ↓ |
CER ↓ |
SDV1 |
84.98 |
92.27 |
SDV2 |
83.16 |
94.52 |
Glide |
89.92 |
95.25 |
CogView 2 |
89.55 |
96.87 |
DALL.E V2 |
74.89 |
87.46 |
Paella |
89.83 |
97.37 |
minDALL-E |
90.85 |
96.44 |
DALL-E_Mini |
94.06 |
99.42 |
4) Emotions:
|
K=5 |
K=10 |
|
|
|
ClipScore |
CIDEr |
BLEU-1 |
BLEU-4 |
ClipScore |
CIDEr |
BLEU-1 |
BLEU-4 |
CLS 8 classes |
CLS 2 classes |
SDV1 |
0.33964 |
0.80675 |
0.24417 |
0.09761 |
0.34099 |
0.91594 |
0.26456 |
0.10731 |
0.1493 |
0.5402 |
SDV2 |
0.32798 |
0.77978 |
0.23735 |
0.09461 |
0.32947 |
0.88801 |
0.25737 |
0.10395 |
0.1563 |
0.5306 |
Glide |
0.30435 |
0.73388 |
0.2248 |
0.08747 |
0.30685 |
0.8283 |
0.24354 |
0.09623 |
0.1414 |
0.5258 |
CogView 2 |
0.30817 |
0.7153 |
0.2231 |
0.08438 |
0.31068 |
0.81084 |
0.24127 |
0.09245 |
0.1631 |
0.5338 |
DALL.E V2 |
0.35513 |
0.88045 |
0.26353 |
0.10843 |
0.35723 |
1.00133 |
0.28493 |
0.11965 |
0.137 |
0.5083 |
Paella |
0.3273 |
0.73241 |
0.22556 |
0.08737 |
0.32911 |
0.8296 |
0.24444 |
0.09574 |
0.1429 |
0.5272 |
minDALL-E |
0.28673 |
0.65943 |
0.21161 |
0.07592 |
0.28895 |
0.7581 |
0.22994 |
0.08364 |
0.1507 |
0.5274 |
DALL-E_Mini |
0.33983 |
0.7399 |
0.23779 |
0.09063 |
0.34185 |
0.85163 |
0.25824 |
0.10128 |
0.1671 |
0.5584 |
5) Consistency:
|
Easy |
Medium |
Hard |
SD1 |
0.799 |
0.79 |
0.78 |
SD2 |
0.81 |
0.807 |
0.801 |
Glide |
0.788 |
0.781 |
0.773 |
CogView |
0.727 |
0.719 |
0.713 |
Dalle 2 |
0.825 |
0.816 |
0.807 |
Paella |
0.825 |
0.817 |
0.813 |
MiniDalle |
0.728 |
0.723 |
0.713 |
Dalle-Mini |
0.827 |
0.816 |
0.809 |
6) Typos:
|
Easy |
Medium |
Hard |
SD1 |
0.785 |
0.765 |
0.734 |
SD2 |
0.801 |
0.7788 |
0.739 |
Glide |
0.777 |
0.764 |
0.743 |
CogView |
0.718 |
0.7 |
0.68 |
Dalle 2 |
0.817 |
0.8 |
0.78 |
Paella |
0.813 |
0.798 |
0.77 |
MiniDalle |
0.725 |
0.708 |
0.696 |
Dalle-Mini |
0.806 |
0.7798 |
0.748 |
7) Spatial-Size-Colors Compositions:
|
Spatial ↑ |
Size ↑ |
Colors ↑ |
|
Easy |
Medium |
Hard |
Easy |
Medium |
Hard |
Easy |
Medium |
Hard |
SDV1 |
21.75 |
0 |
0 |
27.34 |
0 |
0 |
30 |
0 |
0 |
SDV2 |
1.19 |
0 |
0 |
0.19 |
0.19 |
0 |
20 |
0 |
0 |
Glide |
2.49 |
0 |
0 |
6.78 |
0 |
0 |
15 |
0 |
0 |
CogView 2 |
8.88 |
0 |
0 |
11.97 |
0 |
0 |
15 |
0 |
0 |
DALL.E V2 |
28.34 |
0 |
0 |
29.94 |
0 |
0 |
38 |
0 |
0 |
Paella |
8.78 |
0 |
0 |
7.38 |
0 |
0 |
3 |
0 |
0 |
minDALL-E |
4.29 |
0 |
0 |
2.19 |
0 |
0 |
2 |
0 |
0 |
DALL-EMini |
15.17 |
0 |
0 |
19.16 |
0 |
0 |
35 |
0 |
0 |
Struct-Diff |
24 |
0 |
0 |
31.13 |
0 |
0 |
33 |
0 |
0 |
8) Actions Compositions:
|
Easy |
Medium |
Hard |
|
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
SDV1 |
0.5724 |
0.4765 |
0.3737 |
0.2921 |
2.4007 |
0.3538 |
0.257 |
0.1888 |
0.1452 |
1.1458 |
0.3617 |
0.2706 |
0.1997 |
0.1526 |
0.6455 |
SDV2 |
0.5739 |
0.4774 |
0.3761 |
0.2936 |
2.3213 |
0.3691 |
0.2696 |
0.1963 |
0.1499 |
1.1447 |
0.3726 |
0.2774 |
0.2029 |
0.155 |
0.6909 |
Glide |
0.4616 |
0.3401 |
0.2493 |
0.1905 |
1.6979 |
0.295 |
0.1875 |
0.1303 |
0.0983 |
0.8887 |
0.2887 |
0.1992 |
0.141 |
0.1069 |
0.5155 |
CogView 2 |
0.5361 |
0.4322 |
0.3317 |
0.2569 |
2.1038 |
0.3367 |
0.234 |
0.1679 |
0.1276 |
1.0004 |
0.3353 |
0.2394 |
0.1726 |
0.132 |
0.6352 |
DALL.E V2 |
0.6349 |
0.5389 |
0.4295 |
0.3387 |
2.4626 |
0.3367 |
0.234 |
0.1679 |
0.1276 |
1.1688 |
0.3996 |
0.2955 |
0.2164 |
0.1654 |
0.73 |
Paella |
0.5188 |
0.4115 |
0.3112 |
0.2392 |
1.9356 |
0.3376 |
0.235 |
0.168 |
0.128 |
1.0338 |
0.3202 |
0.2241 |
0.1593 |
0.1207 |
0.5607 |
minDALL-E |
0.4975 |
0.3824 |
0.2839 |
0.2164 |
1.8236 |
0.3171 |
0.211 |
0.1506 |
0.1145 |
0.9033 |
0.3108 |
0.2148 |
0.1517 |
0.1147 |
0.5708 |
DALL-E_Mini |
0.5818 |
0.4818 |
0.3779 |
0.2956 |
2.3254 |
0.3571 |
0.2586 |
0.1897 |
0.1455 |
1.1249 |
0.3473 |
0.2524 |
0.1829 |
0.1396 |
0.6289 |
9) creativity:
|
Easy |
Medium |
Hard |
|
deviation |
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
deviation |
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
deviation |
BLEU1 |
BLEU2 |
BLEU3 |
BLEU4 |
CIDEr |
SDV1 |
0.3368 |
0.4175 |
0.3003 |
0.2165 |
0.1656 |
0.639 |
0.32 |
0.402 |
0.2913 |
0.207 |
0.1573 |
0.6544 |
0.3412 |
0.3159 |
0.2118 |
0.1463 |
0.1102 |
0.3521 |
SDV2 |
0.3437 |
0.4264 |
0.3124 |
0.2276 |
0.1747 |
0.6648 |
0.3325 |
0.4176 |
0.3058 |
0.2195 |
0.1672 |
0.6613 |
0.3537 |
0.3297 |
0.2268 |
0.1592 |
0.1198 |
0.3642 |
Glide |
0.2956 |
0.3876 |
0.2677 |
0.1892 |
0.1438 |
0.5685 |
0.2881 |
0.3696 |
0.2509 |
0.1746 |
0.1318 |
0.5654 |
0.2912 |
0.2861 |
0.185 |
0.1275 |
0.0958 |
0.2988 |
CogView 2 |
0.3343 |
0.3878 |
0.2663 |
0.1888 |
0.1434 |
0.5605 |
0.3037 |
0.3804 |
0.2626 |
0.1846 |
0.1398 |
0.5674 |
0.2825 |
0.2844 |
0.1823 |
0.1257 |
0.0945 |
0.2695 |
DALL.E V2 |
0.2956 |
0.4341 |
0.3198 |
0.2304 |
0.1763 |
0.7083 |
0.3056 |
0.4431 |
0.3274 |
0.2397 |
0.184 |
0.6848 |
0.2862 |
0.3356 |
0.2299 |
0.1592 |
0.1199 |
0.3717 |
Paella |
0.2968 |
0.404 |
0.2833 |
0.2011 |
0.1532 |
0.5936 |
0.2793 |
0.4014 |
0.2829 |
0.2004 |
0.1524 |
0.6064 |
0.2893 |
0.3114 |
0.2038 |
0.1398 |
0.105 |
0.3114 |
minDALL-E |
0.3368 |
0.3724 |
0.2481 |
0.1734 |
0.1312 |
0.5208 |
0.3268 |
0.3546 |
0.2346 |
0.1633 |
0.1236 |
0.5092 |
0.3281 |
0.2572 |
0.1613 |
0.1104 |
0.0828 |
0.2411 |
DALL-E_Mini |
0.315 |
0.421 |
0.3019 |
0.217 |
0.1659 |
0.6292 |
0.295 |
0.4149 |
0.3006 |
0.215 |
0.1639 |
0.6475 |
0.2906 |
0.3251 |
0.2186 |
0.1508 |
0.1133 |
0.3613 |
10) Gender Bias:
|
MAD % |
SDV1 |
7.94 |
SDV2 |
18.51 |
CogView 2 |
17.83 |
DALL.E V2 |
18.05 |
minDALL-E |
23.07 |
11) Fairness:
|
Gender Fairness Score |
Styles Fairness Score |
SDV1 |
1.41 |
0.1047 |
SDV2 |
0.63 |
0.1146 |
Glide |
0.36 |
0.06246 |
CogView 2 |
3.42 |
0.0622 |
DALL.E V2 |
1.71 |
0.1117 |
Paella |
1.90 |
0.0947 |
minDALL-E |
0.50 |
0.1188 |
DALL-E_Mini |
1.67 |
0.1147 |