6.870 Object Recognition and Scene Understanding, Fall 2008

6.870 Object Recognition and Scene Understanding

Fall 2008


This class will review and discuss current approaches to object recognition and scene understanding in computer vision. The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, grammars for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology on object and scene recognition.

Mondays/Wednesdays 2:30-4pm

Room 3-442 (where is this?)

Instructor: Antonio Torralba

Email: torralba csail mit edu

T.A.: Ce Liu


Advice for presenters: For each student presentation, I will select two or three papers that need to be covered. While preparing your presentation, you should try to integrate the papers as much as possible into a coherent presentation. Instead of dividing your presentation as having several disconected parts, one for each paper, try to find the commonalities between the papers.

Date Topic Presenter


Week 1 Introduction      
Sept. 3
Class goals Antonio



Week 2 Single class object detection,
Objects without scenes
Sept. 8
Overview on object recognition and
one practical example
Antonio lecture2.ppt

Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987.

Fischler and Elschlager. The representation and matching of pictorial images. IEEE Transactions on Computers, Volume 22, 1973.

Code: A simple object detector with boosting, from the Short course on recognizing and learning object categories, by Fei-Fei, Fergus, and Torralba. 2005.

Sept. 10
Template matching and
gradient histograms

Nicolas Pinto

Jenny Yuen

Nicolas presentation.pdf

Connecting labelme and the DT detector by Jenny

Lowe. Object recognition from local scale-invariant features, ICCV 1999. (code)

Dalal and Triggs. Histograms of Oriented Gradients for Human Detection, CVPR 2005. (code)

Felzenszwalb, McAllester and Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR 2008. (code)

Week 3 Thousands of categories      
Sept. 15
Levels of categorization
and multiclass object recognition
Antonio lecture3.ppt

E. Rosch. Principles of Categorization. 1978.

S. E. Palmer. "Vision Science", chapter 9. (In fact, just read the entire book).

B. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. IJCV 2008. (website)

Sept. 17

Sharing parts for intraclass transfer learning

Sharat Chikkerur

Hueihan Jhuang

Sharat's presentation.pdf

Shared parts for actions by Hueihan

Fergus, Perona, and Zisserman. Object class recognition by unsupervised scale invariant learning. CVPR 2003. (code)

Fei-Fei, Fergus and Perona. One-Shot learning of object categories. PAMI, 2006.

Torralba, Murphy and Freeman. Sharing visual features for multiclass and multiview object detection. PAMI 2007.

Sept. 22

Student Holiday - No class
Week 4 3D object models      
Sept. 24

Explicit and implicit 3D object models

Antonio lecture4.ppt
Class drawings will be posted soon.

S. E. Palmer. "Vision Science", chapter 9.

Joseph L. Mundy. Object Recognition in the Geometric Era: a Retrospective. 2006.

Sept. 29
Recognition of 3D objects Presenter:
Alec Rivers
Alec's presentation.ppt

J. Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR 2006.

D. Hoiem, C. Rother, and J. Winn. 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation. CVPR 2007.

S. Savarese and L. Fei-Fei. 3D generic object categorization, localization and pose estimation. ICCV 2007.

Week 5 Scenes without objects      
Oct. 1
Global scene representations

Tilke Judd

Nicolas Pinto

Tilke's presentation.pdf

Nicolas' gist implementation

A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 2001. (gist code)

L. Fei-Fei and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. CVPR. 2005.

S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006.

Oct. 6
Scene recognition Aude Oliva lecture5.pdf A. Oliva. Gist of the scene. Chapter, Neurobiology of attention
Week 6 Objects in context      
Oct. 8
Scenes and objects Antonio lecture6.ppt

I. Biederman, R.J. Mezzanotte, and J.C. Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 1982.

A. Oliva, A. Torralba. The role of context in object recognition. Trends in Cognitive Sciences, 2007.

Oct. 13
Student Holiday - No class
Oct. 15
ECCV - No class



Week 7 Internet vision and the power of lots of data      
Oct. 20
Powers of 10 Antonio lecture7.ppt

Oct. 22

Vladimir Bychkovsky

Krista Ehinger

Scene completion demo by Krista

N. Snavely, S. M. Seitz, R. Szeliski. Photo tourism: Exploring photo collections in 3D, Siggraph 2006 (website) (code)

J. Hays, A. A. Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007, (website and code)

A. Torralba, R. Fergus, W. T. Freeman, 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI 2008. (website)

Week 8 Low and Mid-level vision      
Oct. 27
Low-level vision:
shading, reflectance, and texture.
Bill Freeman lecture8.ppt

M. Tappen, W. Freeman and E. Adelson. Recovering intrinsic images from a single image. PAMI 2005.

A. Efros and W. Freeman. Image Quilting for Texture Synthesis and transfer. SIGGRAPH 2001

Oct. 29
Edges, regions and textures

Tom Ouyang

Gokberk Cinbis

Tom's presentation.ppt

Gokberk Cinbis

H.G. Barrow, J.M. Tenenbaum. Recovering Intrinsic Scene Characteristics from Images. Artificial Intelligence 1977

J. Shi and J. Malik. Normalized Cuts and Image Segmentation.

J. Portilla and E. Simoncelli. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. IJCV 2004. (code)

Week 9 Grammars for
objects and scenes
Nov. 3
Grammars and topic models Antonio
& Meg Aycinena
S.C. Zhu and D. Mumford. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2006.
Nov. 5
Grammars for low, mid and high level vision

Tom Kollar &
Hueihan Jhuang

Kollar's presentation.ppt

Hueihan's presentation.ppt

Zhuowen Tu; Song-Chun Zhu. Image segmentation by data-driven Markov chain Monte Carlo

Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-Chun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition

Z.J. Xu, H. Chen, S.C. Zhu, and J. Luo. A Composite Template for Human Face Modeling and Sketch

F. Han and S.C. Zhu. Bottom-up/Top-down Image Parsing with Attribute Graph Grammar

Week 10 3D scene models      
Nov. 10
Student Holiday - No class
Nov. 12
3D scenes


A. Criminisi, I. Reid, and A. Zisserman. "Single View Metrology". Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, 1999. (website)

LabelMe 3D (website)
Nov. 17

Krista Ehinger

Tom Kollar

Krista's presentation.ppt

A comparison of 5 techniques by Tom

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997 (website)

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005. (website)

A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007 (website)

Week 6 Objects in context, part 2      
Nov. 19

Gokberk Cinbis

Sharat Chikkerur

Gokberk's presentation.ppt

Recipies for computing 'gist' features by Sharat

A. Torralba. Contextual priming for object detection. IJCV 2003.

D Hoiem, A. Efros, and M Hebert. Geometric context from a single image. ICCV 2005.

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

Week 11 Hierarchies      
Nov. 24
Biological inspired computer vision Antonio

Nat Twarog


Twarog's presentation.ppt

T. Serre, L. Wolf and T. Poggio. Object recognition with features inspired by visual cortex. CVPR 2005

B. Epshtein and S. Ullman. Feature Hierarchies for Object Classification. ICCV 05

D. Geman. Coarse-to-Fine Classification and Scene Labeling.

  What happens if we solve object recognition?      
Nov. 26


Jenny Yuen

lecture12.ppt and a few notes on how to prepare a talk.

P. Cavanagh, Vision is getting easier every day, Perception 1996

Where are the flying cars?

Y. Jin, S. Baluja, H. Rowley. Canonical Image Selection from the Web, CIVR 2007

Week 12 Class project presentations      
Dec. 1
20 minutes for each presentation

2:30 pm - 4:30pm

  2:35 Jenny Yuen
3:00 Gokberk Cinbis
3:25 Alec Rivers
3:50 Tom Ouyang
Dec. 3
  1:30pm - 4:00pm

1:35 Hueihan Jhuang and Sharat Chikkerur
2:00 Nathaniel R Twarog
2:25 Tom Kollar
2:50 Tilke Judd and Vladimir Bychkovsky
3:15 Nicolas Pinto

3:40 Vote count!


Related courses:



Other resources:


Internet vision

The web is a source of many interesting new vision problems for which lots of data is available. Here there are some links to emerging resources of images and videos to brainstorm about related vision problems. These datasets are less structured than traditional benchmarks used in computer vision making evaluation harder, but they can open original research problems.


Here there are links to useful code for low-level and mid-level vision tasks:

Other useful code: