6.870 Object Recognition and Scene Understanding, Fall 2008

6.870 Object Recognition and Scene Understanding

Fall 2008

Overview

This class will review and discuss current approaches to object recognition and scene understanding in computer vision. The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, grammars for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology on object and scene recognition.

Mondays/Wednesdays 2:30-4pm

Room 3-442 (where is this?)

Instructor: Antonio Torralba

Email: torralba csail mit edu

T.A.: Ce Liu

Schedule

Advice for presenters: For each student presentation, I will select two or three papers that need to be covered. While preparing your presentation, you should try to integrate the papers as much as possible into a coherent presentation. Instead of dividing your presentation as having several disconected parts, one for each paper, try to find the commonalities between the papers.

Date	Topic	Presenter	Slides/videos	Papers/code
Week 1	Introduction
W Sept. 3	Class goals	Antonio	lecture1.ppt blur.avi highres.avi
Week 2	Single class object detection, Objects without scenes
M Sept. 8	Overview on object recognition and one practical example	Antonio	lecture2.ppt	Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987. Fischler and Elschlager. The representation and matching of pictorial images. IEEE Transactions on Computers, Volume 22, 1973. Code: A simple object detector with boosting, from the Short course on recognizing and learning object categories, by Fei-Fei, Fergus, and Torralba. 2005.
W Sept. 10	Template matching and gradient histograms	Presenter: Nicolas Pinto Evaluator: Jenny Yuen	Nicolas presentation.pdf Evaluation: Connecting labelme and the DT detector by Jenny	Lowe. Object recognition from local scale-invariant features, ICCV 1999. (code) Dalal and Triggs. Histograms of Oriented Gradients for Human Detection, CVPR 2005. (code) Felzenszwalb, McAllester and Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR 2008. (code)
Week 3	Thousands of categories
M Sept. 15	Levels of categorization and multiclass object recognition	Antonio	lecture3.ppt	E. Rosch. Principles of Categorization. 1978. S. E. Palmer. "Vision Science", chapter 9. (In fact, just read the entire book). B. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. IJCV 2008. (website)
W Sept. 17	Sharing parts for intraclass transfer learning	Presenter: Sharat Chikkerur Evaluator: Hueihan Jhuang	Sharat's presentation.pdf Evaluation: Shared parts for actions by Hueihan	Fergus, Perona, and Zisserman. Object class recognition by unsupervised scale invariant learning. CVPR 2003. (code) Fei-Fei, Fergus and Perona. One-Shot learning of object categories. PAMI, 2006. Torralba, Murphy and Freeman. Sharing visual features for multiclass and multiview object detection. PAMI 2007.
M Sept. 22	Student Holiday - No class
Week 4	3D object models
W Sept. 24	Explicit and implicit 3D object models	Antonio	lecture4.ppt Class drawings will be posted soon.	S. E. Palmer. "Vision Science", chapter 9. Joseph L. Mundy. Object Recognition in the Geometric Era: a Retrospective. 2006.
M Sept. 29	Recognition of 3D objects	Presenter: Alec Rivers	Alec's presentation.ppt	J. Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR 2006. D. Hoiem, C. Rother, and J. Winn. 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation. CVPR 2007. S. Savarese and L. Fei-Fei. 3D generic object categorization, localization and pose estimation. ICCV 2007.
Week 5	Scenes without objects
W Oct. 1	Global scene representations	Presenter: Tilke Judd Evaluator: Nicolas Pinto	Tilke's presentation.pdf Evaluation: Nicolas' gist implementation	A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 2001. (gist code) L. Fei-Fei and P. Perona. A Bayesian Hierarchical Model for Learning Natural Scene Categories. CVPR. 2005. S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006.
M Oct. 6	Scene recognition	Aude Oliva	lecture5.pdf	A. Oliva. Gist of the scene. Chapter, Neurobiology of attention
Week 6	Objects in context
W Oct. 8	Scenes and objects	Antonio	lecture6.ppt	I. Biederman, R.J. Mezzanotte, and J.C. Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 1982. A. Oliva, A. Torralba. The role of context in object recognition. Trends in Cognitive Sciences, 2007.
M. Oct. 13	Student Holiday - No class
W. Oct. 15	ECCV - No class
Week 7	Internet vision and the power of lots of data
M. Oct. 20	Powers of 10	Antonio	lecture7.ppt
W. Oct. 22		Presenter: Vladimir Bychkovsky Evaluator: Krista Ehinger	Evaluation: Scene completion demo by Krista	N. Snavely, S. M. Seitz, R. Szeliski. Photo tourism: Exploring photo collections in 3D, Siggraph 2006 (website) (code) J. Hays, A. A. Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007, (website and code) A. Torralba, R. Fergus, W. T. Freeman, 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI 2008. (website)
Week 8	Low and Mid-level vision
M. Oct. 27	Low-level vision: shading, reflectance, and texture.	Bill Freeman	lecture8.ppt shadingReflSurvey.pdf	M. Tappen, W. Freeman and E. Adelson. Recovering intrinsic images from a single image. PAMI 2005. A. Efros and W. Freeman. Image Quilting for Texture Synthesis and transfer. SIGGRAPH 2001
W. Oct. 29	Edges, regions and textures	Presenter: Tom Ouyang Evaluator: Gokberk Cinbis	Tom's presentation.ppt Evaluation: Gokberk Cinbis	H.G. Barrow, J.M. Tenenbaum. Recovering Intrinsic Scene Characteristics from Images. Artificial Intelligence 1977 J. Shi and J. Malik. Normalized Cuts and Image Segmentation. J. Portilla and E. Simoncelli. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. IJCV 2004. (code)
Week 9	Grammars for objects and scenes
M. Nov. 3	Grammars and topic models	Antonio & Meg Aycinena	lecture9.ppt	S.C. Zhu and D. Mumford. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2006.
W. Nov. 5	Grammars for low, mid and high level vision	Presenters: Tom Kollar & Hueihan Jhuang	Kollar's presentation.ppt Hueihan's presentation.ppt	Zhuowen Tu; Song-Chun Zhu. Image segmentation by data-driven Markov chain Monte Carlo Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-Chun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition Z.J. Xu, H. Chen, S.C. Zhu, and J. Luo. A Composite Template for Human Face Modeling and Sketch F. Han and S.C. Zhu. Bottom-up/Top-down Image Parsing with Attribute Graph Grammar
Week 10	3D scene models
M. Nov. 10	Student Holiday - No class
W. Nov. 12	3D scenes	Antonio	lecture10.ppt	A. Criminisi, I. Reid, and A. Zisserman. "Single View Metrology". Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, 1999. (website) LabelMe 3D (website)
M. Nov. 17		Presenter: Krista Ehinger Evaluator: Tom Kollar	Krista's presentation.ppt Evaluation: A comparison of 5 techniques by Tom	Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997 (website) D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005. (website) A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007 (website)
Week 6	Objects in context, part 2
W. Nov. 19		Presenter: Gokberk Cinbis Evaluator: Sharat Chikkerur	Gokberk's presentation.ppt Evaluation: Recipies for computing 'gist' features by Sharat	A. Torralba. Contextual priming for object detection. IJCV 2003. D Hoiem, A. Efros, and M Hebert. Geometric context from a single image. ICCV 2005. A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007
Week 11	Hierarchies
M. Nov. 24	Biological inspired computer vision	Antonio Presenter: Nat Twarog	lecture11.ppt Twarog's presentation.ppt	T. Serre, L. Wolf and T. Poggio. Object recognition with features inspired by visual cortex. CVPR 2005 B. Epshtein and S. Ullman. Feature Hierarchies for Object Classification. ICCV 05 D. Geman. Coarse-to-Fine Classification and Scene Labeling.
	What happens if we solve object recognition?
W. Nov. 26		Antonio Presenter: Jenny Yuen	lecture12.ppt and a few notes on how to prepare a talk.	P. Cavanagh, Vision is getting easier every day, Perception 1996 Where are the flying cars? Y. Jin, S. Baluja, H. Rowley. Canonical Image Selection from the Web, CIVR 2007
Week 12	Class project presentations
M. Dec. 1	20 minutes for each presentation	2:30 pm - 4:30pm		2:35 Jenny Yuen 3:00 Gokberk Cinbis 3:25 Alec Rivers 3:50 Tom Ouyang
W. Dec. 3		1:30pm - 4:00pm		1:35 Hueihan Jhuang and Sharat Chikkerur 2:00 Nathaniel R Twarog 2:25 Tom Kollar 2:50 Tilke Judd and Vladimir Bychkovsky 3:15 Nicolas Pinto 3:40 Vote count!

Resources

Related courses:

Scene Understanding Seminar, by Aude Oliva
Learning-Based Methods in Vision, by Alyosha Efros
Object Recognition, by Kristen Grauman
Recognition Problems in Computer Vision, by Greg Mori
Computer Vision, by Rob Fergus
Internet Vision, by Tamara Berg
Computer Vision and the Web, by Svetlana Lazebnik
High-Level Recognition in Computer Vision, by Fei-Fei Li

Workshops:

Tutorials:

Other resources:

The Computer Vision Industry

Datasets

Internet vision

The web is a source of many interesting new vision problems for which lots of data is available. Here there are some links to emerging resources of images and videos to brainstorm about related vision problems. These datasets are less structured than traditional benchmarks used in computer vision making evaluation harder, but they can open original research problems.

Current TV
Flickr and Flickrbits

Code

Here there are links to useful code for low-level and mid-level vision tasks:

Other useful code:

Code for downloading Flickr images