Train in Spain and test in the rest of the world

Try to recognize and segment as many object categories as you can. Training images correspond to outdoor pictures taken in different cities of Spain.

Characteristics of the dataset:

 ImagesObjectsCarsPerson BuildingRoadSidewalkSkyTree
Training set29203216444412524 30041321127210092652
Test set11333285322652119 211773911078231652

  • Training set: contains more than 1000 fully annotated images and around 2000 partially annotated images. Including partially annotated images allows algorithms to show if they are able to benefit from additional partially labeled images. As we try to build large datasets, it will be common to have many images that are only partially annotated, therefore, developing algorithms and training strategies that can cope with this issue will allow using large datasets without having to make the labor intensive effort of careful image annotation.
  • Test set: it only contains images that are fully labeled. The test set corresponds to images taken from the rest of the world which guarantees that images will be quite different between training and test.


  • Many object classes have very few training samples. The distribution of counts is very heavy tailed. There is a dozen of object classes with thousands of training samples, and there are hundreds of object classes with just a handful of training samples.
  • Dealing with partially labeled training images.
  • There is a large range of quality of the annotations. From each polygon you can extract a very good bounding box. But for many objects you can also get a quite accurate segmentation.

    Release October 22, 2008:

    training.tar.gz (5.8 Gbytes) | thumbnails | list of training categories
    test.tar.gz (1.8 Gbytes) | thumbnails | list of test categories

  • Use the LabelMe toolbox to read the annotations and to extract segmentation masks.

    Send us your comments.

    Citation: LabelMe: a database and web-based tool for image annotation. B. Russell, A. Torralba, K. Murphy, W. T. Freeman. International Journal of Computer Vision, 2007.


    8 scene categories and 29.000 annotated objects

    Try to recognize and segment as many object categories as you can. Use 100 images for training from each scene category (this will give you a total of 800 training images), and the rest for testing. Report performances for each object separatelly. Not all the objects have the same amount of training data available. But this reflects the fact that for some objects it is easier to gather data than for others.

    Download datasets, code and paper

    Citation: Modeling the shape of the scene: a holistic representation of the spatial envelope. A. Oliva, A. Torralba. International Journal of Computer Vision, Vol. 42(3): 145-175, 2001.

    (c) MIT, Computer Science and Artificial Intelligence Laboratory.