Our evaluation currently focus on zero-shot transfer models as they are designed to generalize to unseen datasets. Links to the models are provided at the end of the page.
The results include mIoU values averaged by the dataset domains as well as for each dataset separate.
The models are grouped by their size and sorted by the release date. The best-performing models within their group are highlighted in bold, the second-best are underlined.
We provide random and best supervised results as a lower and upper bound.
Zero-shot semantic segmentation
Domain results
Model |
General |
Earth Monitoring |
Medical Sciences |
Engineering |
Agriculture and Biology |
Mean |
Random1 |
1.17 |
7.11 |
29.51 |
11.71 |
6.14 |
10.27 |
Best supervised2 |
49.15 |
79.12 |
89.49 |
67.66 |
81.94 |
71.13 |
|
|
|
|
|
|
|
ZSSeg-B |
19.98 |
17.98 |
41.82 |
14.0 |
22.32 |
22.73 |
ZegFormer-B |
13.57 |
17.25 |
17.47 |
17.92 |
25.78 |
17.57 |
X-Decoder-T |
22.01 |
18.92 |
23.28 |
15.31 |
18.17 |
19.8 |
SAN-B |
29.35 |
30.64 |
29.85 |
23.58 |
15.07 |
26.74 |
OpenSeeD-T |
22.49 |
25.11 |
44.44 |
16.5 |
10.35 |
24.33 |
CAT-Seg-B |
34.96 |
34.57 |
41.65 |
26.26 |
29.32 |
33.74 |
Grounded-SAM-B |
29.51 |
25.97 |
37.38 |
29.51 |
17.66 |
28.52 |
|
|
|
|
|
|
|
OVSeg-L |
29.54 |
29.04 |
31.9 |
14.16 |
28.64 |
26.94 |
SAN-L |
36.18 |
38.83 |
30.27 |
16.95 |
20.41 |
30.06 |
CAT-Seg-L |
39.93 |
39.85 |
48.49 |
26.04 |
34.06 |
38.14 |
Grounded-SAM-L |
30.32 |
26.44 |
38.69 |
29.25 |
17.73 |
29.05 |
CAT-Seg-H |
37.98 |
37.74 |
34.65 |
29.04 |
37.76 |
35.66 |
Grounded-SAM-H |
30.27 |
26.44 |
38.45 |
28.16 |
17.67 |
28.78 |
Dataset results
Model |
BDD100K |
Dark Zurich |
MHP v1 |
FoodSeg103 |
ATLANTIS |
DRAM |
iSAID |
ISPRS Potsdam |
WorldFloods |
FloodNet |
UAVid |
Kvasir-Instrument |
CHASE DB1 |
CryoNuSeg |
PAXRay-4 |
Corrosion CS |
DeepCrack |
PST900 |
ZeroWaste-f |
SUIM |
CUB-200 |
CWFID |
Mean |
Random1 |
1.48 |
1.31 |
1.27 |
0.23 |
0.56 |
2.16 |
0.56 |
8.02 |
18.43 |
3.39 |
5.18 |
27.99 |
27.25 |
31.25 |
31.53 |
9.3 |
26.52 |
4.52 |
6.49 |
5.3 |
0.06 |
13.08 |
10.27 |
Best supervised2 |
44.8 |
63.9 |
50.0 |
45.1 |
42.22 |
45.71 |
65.3 |
87.56 |
92.71 |
82.22 |
67.8 |
93.7 |
97.05 |
73.45 |
93.77 |
49.92 |
85.9 |
82.3 |
52.5 |
74.0 |
84.6 |
87.23 |
70.99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ZSSeg-B |
32.36 |
16.86 |
7.08 |
8.17 |
22.19 |
33.19 |
3.8 |
11.57 |
23.25 |
20.98 |
30.27 |
46.93 |
37.0 |
38.7 |
44.66 |
3.06 |
25.39 |
18.76 |
8.78 |
30.16 |
4.35 |
32.46 |
22.73 |
ZegFormer-B |
14.14 |
4.52 |
4.33 |
10.01 |
18.98 |
29.45 |
2.68 |
14.04 |
25.93 |
22.74 |
20.84 |
27.39 |
12.47 |
11.94 |
18.09 |
4.78 |
29.77 |
19.63 |
17.52 |
28.28 |
16.8 |
32.26 |
17.57 |
X-Decoder-T |
47.29 |
24.16 |
3.54 |
2.61 |
27.51 |
26.95 |
2.43 |
31.47 |
26.23 |
8.83 |
25.65 |
55.77 |
10.16 |
11.94 |
15.23 |
1.72 |
24.65 |
19.44 |
15.44 |
24.75 |
0.51 |
29.25 |
19.8 |
SAN-B |
37.4 |
24.35 |
8.87 |
19.27 |
36.51 |
49.68 |
4.77 |
37.56 |
31.75 |
37.44 |
41.65 |
69.88 |
17.85 |
11.95 |
19.73 |
3.13 |
50.27 |
19.67 |
21.27 |
22.64 |
16.91 |
5.67 |
26.74 |
OpenSeeD-T |
47.95 |
28.13 |
2.06 |
9.0 |
18.55 |
29.23 |
1.45 |
31.07 |
30.11 |
23.14 |
39.78 |
59.69 |
46.68 |
33.76 |
37.64 |
13.38 |
47.84 |
2.5 |
2.28 |
19.45 |
0.13 |
11.47 |
24.33 |
CAT-Seg-B |
44.58 |
27.36 |
20.79 |
21.54 |
33.08 |
62.42 |
15.75 |
41.89 |
39.47 |
35.12 |
40.62 |
70.68 |
25.38 |
25.63 |
44.94 |
13.76 |
49.14 |
21.32 |
20.83 |
39.1 |
3.4 |
45.47 |
33.74 |
Grounded-SAM-B |
41.58 |
20.91 |
29.38 |
10.48 |
17.33 |
57.38 |
12.22 |
26.68 |
33.41 |
19.19 |
38.34 |
46.82 |
23.56 |
38.06 |
41.07 |
20.88 |
59.02 |
21.39 |
16.74 |
14.13 |
0.43 |
38.41 |
28.52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OVSeg-L |
45.28 |
22.53 |
6.24 |
16.43 |
33.44 |
53.33 |
8.28 |
31.03 |
31.48 |
35.59 |
38.8 |
71.13 |
20.95 |
13.45 |
22.06 |
6.82 |
16.22 |
21.89 |
11.71 |
38.17 |
14.0 |
33.76 |
26.94 |
SAN-L |
43.81 |
30.39 |
9.34 |
24.46 |
40.66 |
68.44 |
11.77 |
51.45 |
48.24 |
39.26 |
43.41 |
72.18 |
7.64 |
11.94 |
29.33 |
6.83 |
23.65 |
19.01 |
18.32 |
40.01 |
19.3 |
1.91 |
30.06 |
CAT-Seg-L |
45.83 |
33.1 |
30.03 |
30.47 |
33.6 |
66.54 |
16.09 |
51.42 |
49.86 |
39.84 |
42.02 |
79.4 |
24.99 |
35.06 |
54.5 |
16.87 |
31.42 |
25.26 |
30.62 |
53.94 |
9.24 |
39.0 |
38.14 |
Grounded-SAM-L |
42.69 |
21.92 |
28.11 |
10.76 |
17.63 |
60.8 |
12.38 |
27.76 |
33.4 |
19.28 |
39.37 |
47.32 |
25.16 |
38.06 |
44.22 |
20.88 |
58.21 |
21.23 |
16.67 |
14.3 |
0.43 |
38.47 |
29.05 |
CAT-Seg-H |
48.34 |
29.72 |
23.53 |
29.06 |
40.43 |
56.78 |
9.04 |
49.37 |
47.92 |
40.98 |
41.36 |
70.7 |
13.37 |
12.82 |
41.72 |
12.17 |
57.69 |
19.61 |
26.71 |
47.8 |
19.49 |
45.99 |
35.66 |
Grounded-SAM-H |
42.95 |
22.09 |
28.05 |
9.97 |
17.68 |
60.86 |
12.44 |
27.79 |
33.23 |
19.31 |
39.41 |
46.97 |
25.13 |
38.06 |
43.64 |
20.88 |
53.74 |
21.34 |
16.68 |
14.3 |
0.43 |
38.29 |
28.78 |
Visual oracle prompts
Additionally to langugage guided mdoels, we evaluated SAM using visual oracle prompts in a point-to-mask or box-to-mask setting.
An oracle point or box is provided for each connected segment. See our paper for details.
Domain results
Model |
General |
Earth Monitoring |
Medical Sciences |
Engineering |
Agriculture and Biology |
Mean |
Random1 |
1.17 |
7.11 |
29.51 |
11.71 |
6.14 |
10.27 |
Best supervised2 |
49.15 |
79.12 |
89.49 |
67.66 |
81.94 |
71.13 |
|
|
|
|
|
|
|
SAM-B with oracle points |
50.41 |
38.72 |
43.7 |
45.16 |
57.84 |
46.59 |
SAM-L with oracle points |
45.99 |
44.03 |
55.74 |
50.0 |
58.23 |
49.99 |
SAM-H with oracle points |
36.05 |
34.82 |
59.58 |
47.35 |
39.91 |
43.0 |
|
|
|
|
|
|
|
SAM-B with oracle boxes |
78.5 |
73.56 |
68.14 |
73.29 |
86.0 |
75.67 |
SAM-L with oracle boxes |
78.0 |
73.27 |
64.98 |
73.09 |
86.99 |
74.97 |
SAM-H with oracle boxes |
65.23 |
59.61 |
66.58 |
66.4 |
78.63 |
66.55 |
Dataset results
Model |
BDD100K |
Dark Zurich |
MHP v1 |
FoodSeg103 |
ATLANTIS |
DRAM |
iSAID |
ISPRS Potsdam |
WorldFloods |
FloodNet |
UAVid |
Kvasir-Instrument |
CHASE DB1 |
CryoNuSeg |
PAXRay-4 |
Corrosion CS |
DeepCrack |
PST900 |
ZeroWaste-f |
SUIM |
CUB-200 |
CWFID |
Mean |
Random1 |
1.48 |
1.31 |
1.27 |
0.23 |
0.56 |
2.16 |
0.56 |
8.02 |
18.43 |
3.39 |
5.18 |
27.99 |
27.25 |
31.25 |
31.53 |
9.3 |
26.52 |
4.52 |
6.49 |
5.3 |
0.06 |
13.08 |
10.27 |
Best supervised2 |
44.8 |
63.9 |
50.0 |
45.1 |
42.22 |
45.71 |
65.3 |
87.56 |
92.71 |
82.22 |
67.8 |
93.7 |
97.05 |
73.45 |
93.77 |
49.92 |
85.9 |
82.3 |
52.5 |
74.0 |
84.6 |
87.23 |
70.99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SAM-B with oracle points |
50.47 |
35.19 |
44.55 |
58.48 |
61.5 |
52.29 |
21.54 |
38.93 |
32.36 |
58.83 |
41.93 |
67.25 |
37.1 |
23.95 |
46.51 |
35.99 |
47.16 |
36.39 |
61.09 |
64.71 |
64.35 |
44.47 |
46.59 |
SAM-L with oracle points |
38.09 |
38.46 |
49.51 |
46.85 |
53.68 |
49.34 |
45.03 |
41.58 |
nan |
53.87 |
35.65 |
85.35 |
30.68 |
51.63 |
55.29 |
42.75 |
48.82 |
46.56 |
61.86 |
55.68 |
75.1 |
43.91 |
49.99 |
SAM-H with oracle points |
27.64 |
37.11 |
53.76 |
31.02 |
35.69 |
31.1 |
52.65 |
29.34 |
nan |
33.78 |
23.5 |
84.22 |
33.35 |
64.11 |
56.62 |
34.54 |
55.08 |
55.76 |
43.99 |
28.54 |
48.91 |
42.3 |
43.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SAM-B with oracle boxes |
72.66 |
68.67 |
82.47 |
86.37 |
81.64 |
79.2 |
75.53 |
68.65 |
nan |
76.49 |
73.59 |
92.58 |
22.59 |
85.23 |
72.17 |
67.01 |
66.49 |
75.46 |
84.22 |
86.42 |
86.88 |
84.7 |
75.67 |
SAM-L with oracle boxes |
70.58 |
67.1 |
81.94 |
85.44 |
81.36 |
81.6 |
75.0 |
68.5 |
nan |
76.78 |
72.81 |
93.5 |
22.82 |
76.1 |
67.49 |
64.15 |
69.35 |
73.97 |
84.9 |
87.43 |
89.48 |
84.05 |
74.97 |
SAM-H with oracle boxes |
57.93 |
59.96 |
76.22 |
60.2 |
68.82 |
68.26 |
73.72 |
52.04 |
nan |
59.4 |
53.29 |
91.03 |
33.41 |
75.16 |
66.73 |
56.75 |
66.55 |
67.78 |
74.52 |
67.85 |
84.71 |
83.32 |
66.55 |
Model implementations
Links to the official implementations and the code adaptations for the MESS benchmark:
Feel free to add your results by contacting us via email.
1 Random is a lower bound. The values represent the expected mIoU from predictions with uniform class distribution.
2 Best supervised are recent supervised models for each dataset individually. We refer to our paper for the details.