Unsupervised Joint Object Discovery and Segmentation in Internet Images

Supplementary Material

 

Comparison with previous co-segmentation methods [7, 8, 11] on MSRC and iCoseg

These results supplement Figures 4 and 5 in the paper.

The following are visual comparisons of our results and results of state-of-the-art co-segmentation methods on the standard co-segmentation datasets, MSRC [21] and iCoseg [2].

Side-by-side comparison with [7, 8, 11] on MSRC - uniform sample of 50% of the images per class

Side-by-side comparison with [7, 8, 11] on iCoseg - uniform sample of 50% of the images per class


In addition, the following are quantitative comparisons using Jaccard similarity, supplementing Figure 4 in the paper where only precision is shown.

Jaccard similarity per class on MSRC

Jaccard similarity per class on iCoseg

 


Comparison with Object Co-segmentation [23]

These results supplement Table 1 in the paper.

Note: in this subsection, when we refer to MSRC and iCoseg, we mean the subsets of these datasets that were used by [23] for the evaluation in their paper (see Section 4.1 line 521).


Side-by-side comparison with [23] on MSRC - all classes and images are shown

Side-by-side comparison with [23] on iCoseg - all classes and images are shown


In the paper we showed the overall precision and Jaccard similarity on these two datasets (Table 1), and here we additionally show the breakdown of these performance metrics per class (the numbers in Table 1 correspond to the leftmost column in each plot, labeled "Average").

Precision per class on MSRC

Jaccard similarity per class on MSRC

Precision per class on iCoseg

Jaccard similarity per class on iCoseg

 


Comparison with previous co-segmentation methods on Internet datasets

This supplements Table 3 and Figure 7 in the paper.

Note: For this comparison, we selected 100 images from each dataset with available ground truth labelings, since the competing methods do not scale to large datasets (see Section 4.2, line 618).

We show all the results for each dataset (for all 100 images). In the rightmost column we show the human labels we collected (black pixels represent background and white pixels represent foreground), to convince the reader that the ground truth labels we collected and used for the quantitative evaluation on these datasets are of high quality.


Side-by-side comparison with [7, 8, 11] on Car - all images are shown

Side-by-side comparison with [7, 8, 11] on Horse - all images are shown

Side-by-side comparison with [7, 8, 11] on Airplane - all images are shown

 


Results on Internet datasets

Here we show more qualitative results of our algorithm on the full internet datasets. In Figure 6 in the paper, we have manually chosen specific results that illustrate the strengths and weaknesses of our algorithm. Here on the other hand, we show a uniform sample of the results on each dataset so that the reader can get a non-biased impression of the results.

For each image, we show the source image on the left, and our segmentation on the right, similar to Figure 6 in the paper.

Our results on Car - 500 images randomly selected out of 4,347 images

Our results on Horse - 500 images randomly selected out of 6,381 images

Our results on Airplane - 500 images randomly selected out of 4,542 images


More qualitative results on additional datasets not shown in the paper:

Our results on Dolphin - 300 images randomly selected out of 677 images

Our results on Space Shuttle - 300 images randomly selected out of 394 images

Our results on Piano - 300 images randomly selected out of 1,791 images

 


References

[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen. iCoseg: Interactive co-segmentation with intelligent scribble guidance. In CVPR, 2010.

[7] A. Joulin, F. Bach, and J. Ponce. Discriminative clustering for image co-segmentation. In CVPR, 2010.

[8] A. Joulin, F. Bach, and J. Ponce. Multi-class cosegmentation. In CVPR, 2012.

[11] G. Kim, E. Xing, L. Fei-Fei, and T. Kanade. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV, 2011.

[21] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, pages 1–15, 2006.

[23] S. Vicente, C. Rother, and V. Kolmogorov. Object cosegmentation. In CVPR, pages 2217–2224, 2011.