6.870 Object Recognition and Scene Understanding, Fall 2008

6.870 Object Recognition and Scene Understanding

Fall 2008

Overview

This class will review and discuss current approaches to object recognition and scene understanding in computer vision. The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, grammars for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology on object and scene recognition.

Mondays/Wednesdays 2:30-4pm

Room 3-442 (where is this?)

Instructor: Antonio Torralba

Email: torralba csail mit edu

T.A.: Ce Liu

Schedule

Advice for presenters: For each student presentation, I will select two or three papers that need to be covered. While preparing your presentation, you should try to integrate the papers as much as possible into a coherent presentation. Instead of dividing your presentation as having several disconected parts, one for each paper, try to find the commonalities between the papers.

Date Topic Presenter

Slides/videos

Papers/code
Week 1 Introduction      
W
Sept. 3
Class goals Antonio

lecture1.ppt
blur.avi

highres.avi

 
Week 2 Single class object detection,
Objects without scenes
     
M
Sept. 8
Overview on object recognition and
one practical example
Antonio lecture2.ppt


Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987.

Fischler and Elschlager. The representation and matching of pictorial images. IEEE Transactions on Computers, Volume 22, 1973.

Code: A simple object detector with boosting, from the Short course on recognizing and learning object categories, by Fei-Fei, Fergus, and Torralba. 2005.

W
Sept. 10
Template matching and
gradient histograms

Presenter:
Nicolas Pinto

Evaluator:
Jenny Yuen

Nicolas presentation.pdf

Evaluation:
Connecting labelme and the DT detector by Jenny


Lowe. Object recognition from local scale-invariant features, ICCV 1999. (code)

Dalal and Triggs. Histograms of Oriented Gradients for Human Detection, CVPR 2005. (code)

Felzenszwalb, McAllester and Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR 2008. (code)

Week 3 Thousands of categories      
M
Sept. 15
Levels of categorization
and multiclass object recognition
Antonio lecture3.ppt


E. Rosch. Principles of Categorization. 1978.

S. E. Palmer. "Vision Science", chapter 9. (In fact, just read the entire book).

B. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. IJCV 2008. (website)

W
Sept. 17

Sharing parts for intraclass transfer learning

Presenter:
Sharat Chikkerur

Evaluator:
Hueihan Jhuang

Sharat's presentation.pdf

Evaluation:
Shared parts for actions by Hueihan


Fergus, Perona, and Zisserman. Object class recognition by unsupervised scale invariant learning. CVPR 2003. (code)

Fei-Fei, Fergus and Perona. One-Shot learning of object categories. PAMI, 2006.

Torralba, Murphy and Freeman. Sharing visual features for multiclass and multiview object detection. PAMI 2007.

M
Sept. 22

Student Holiday - No class
     
Week 4 3D object models      
W
Sept. 24

Explicit and implicit 3D object models

Antonio lecture4.ppt
Class drawings will be posted soon.


S. E. Palmer. "Vision Science", chapter 9.

Joseph L. Mundy. Object Recognition in the Geometric Era: a Retrospective. 2006.

M
Sept. 29
Recognition of 3D objects Presenter:
Alec Rivers
Alec's presentation.ppt


J. Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR 2006.

D. Hoiem, C. Rother, and J. Winn. 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation. CVPR 2007.

S. Savarese and L. Fei-Fei. 3D generic object categorization, localization and pose estimation. ICCV 2007.

Week 5 Scenes without objects      
W
Oct. 1
Global scene representations

Presenter:
Tilke Judd

Evaluator:
Nicolas Pinto

Tilke's presentation.pdf

Evaluation:
Nicolas' gist implementation


A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 2001. (gist code)

L. Fei-Fei and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. CVPR. 2005.

S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006.

M
Oct. 6
Scene recognition Aude Oliva lecture5.pdf A. Oliva. Gist of the scene. Chapter, Neurobiology of attention
Week 6 Objects in context      
W
Oct. 8
Scenes and objects Antonio lecture6.ppt


I. Biederman, R.J. Mezzanotte, and J.C. Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 1982.

A. Oliva, A. Torralba. The role of context in object recognition. Trends in Cognitive Sciences, 2007.

M.
Oct. 13
Student Holiday - No class
     
W.
Oct. 15
ECCV - No class

 

 




Week 7 Internet vision and the power of lots of data      
M.
Oct. 20
Powers of 10 Antonio lecture7.ppt


W.
Oct. 22
 

Presenter:
Vladimir Bychkovsky

Evaluator:
Krista Ehinger

Evaluation:
Scene completion demo by Krista


N. Snavely, S. M. Seitz, R. Szeliski. Photo tourism: Exploring photo collections in 3D, Siggraph 2006 (website) (code)

J. Hays, A. A. Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007, (website and code)

A. Torralba, R. Fergus, W. T. Freeman, 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI 2008. (website)

Week 8 Low and Mid-level vision      
M.
Oct. 27
Low-level vision:
shading, reflectance, and texture.
Bill Freeman lecture8.ppt
shadingReflSurvey.pdf


M. Tappen, W. Freeman and E. Adelson. Recovering intrinsic images from a single image. PAMI 2005.

A. Efros and W. Freeman. Image Quilting for Texture Synthesis and transfer. SIGGRAPH 2001

W.
Oct. 29
Edges, regions and textures


Presenter:
Tom Ouyang

Evaluator:
Gokberk Cinbis

Tom's presentation.ppt

Evaluation:
Gokberk Cinbis


H.G. Barrow, J.M. Tenenbaum. Recovering Intrinsic Scene Characteristics from Images. Artificial Intelligence 1977

J. Shi and J. Malik. Normalized Cuts and Image Segmentation.

J. Portilla and E. Simoncelli. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. IJCV 2004. (code)

Week 9 Grammars for
objects and scenes
     
M.
Nov. 3
Grammars and topic models Antonio
& Meg Aycinena
lecture9.ppt
S.C. Zhu and D. Mumford. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2006.
W.
Nov. 5
Grammars for low, mid and high level vision

Presenters:
Tom Kollar &
Hueihan Jhuang

Kollar's presentation.ppt

Hueihan's presentation.ppt


Zhuowen Tu; Song-Chun Zhu. Image segmentation by data-driven Markov chain Monte Carlo

Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-Chun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition

Z.J. Xu, H. Chen, S.C. Zhu, and J. Luo. A Composite Template for Human Face Modeling and Sketch

F. Han and S.C. Zhu. Bottom-up/Top-down Image Parsing with Attribute Graph Grammar

Week 10 3D scene models      
M.
Nov. 10
Student Holiday - No class
     
W.
Nov. 12
3D scenes

Antonio

lecture10.ppt
A. Criminisi, I. Reid, and A. Zisserman. "Single View Metrology". Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, 1999. (website)

LabelMe 3D (website)
M.
Nov. 17
 

Presenter:
Krista Ehinger

Evaluator:
Tom Kollar

Krista's presentation.ppt

Evaluation:
A comparison of 5 techniques by Tom

Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997 (website)

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005. (website)

A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007 (website)

Week 6 Objects in context, part 2      
W.
Nov. 19
 

Presenter:
Gokberk Cinbis

Evaluator:
Sharat Chikkerur

Gokberk's presentation.ppt

Evaluation:
Recipies for computing 'gist' features by Sharat


A. Torralba. Contextual priming for object detection. IJCV 2003.

D Hoiem, A. Efros, and M Hebert. Geometric context from a single image. ICCV 2005.

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

Week 11 Hierarchies      
M.
Nov. 24
Biological inspired computer vision Antonio

Presenter:
Nat Twarog

lecture11.ppt

Twarog's presentation.ppt


T. Serre, L. Wolf and T. Poggio. Object recognition with features inspired by visual cortex. CVPR 2005

B. Epshtein and S. Ullman. Feature Hierarchies for Object Classification. ICCV 05

D. Geman. Coarse-to-Fine Classification and Scene Labeling.

  What happens if we solve object recognition?      
W.
Nov. 26
 

Antonio

Presenter:
Jenny Yuen

lecture12.ppt and a few notes on how to prepare a talk.

P. Cavanagh, Vision is getting easier every day, Perception 1996

Where are the flying cars?

Y. Jin, S. Baluja, H. Rowley. Canonical Image Selection from the Web, CIVR 2007

Week 12 Class project presentations      
M.
Dec. 1
20 minutes for each presentation

2:30 pm - 4:30pm


  2:35 Jenny Yuen
3:00 Gokberk Cinbis
3:25 Alec Rivers
3:50 Tom Ouyang
W.
Dec. 3
  1:30pm - 4:00pm

1:35 Hueihan Jhuang and Sharat Chikkerur
2:00 Nathaniel R Twarog
2:25 Tom Kollar
2:50 Tilke Judd and Vladimir Bychkovsky
3:15 Nicolas Pinto

3:40 Vote count!

Resources

Related courses:

Workshops:

Tutorials:

Other resources:

Datasets

Internet vision

The web is a source of many interesting new vision problems for which lots of data is available. Here there are some links to emerging resources of images and videos to brainstorm about related vision problems. These datasets are less structured than traditional benchmarks used in computer vision making evaluation harder, but they can open original research problems.

Code

Here there are links to useful code for low-level and mid-level vision tasks:

Other useful code: