6.870 Grounding Object Recognition and Scene Understanding, Fall 2011

6.870 Grounding Object Recognition
and Scene Understanding

Fall 2011

Overview

This class will cover current approaches to object recognition and scene understanding in computer vision and its relation to other disciplines. The goal of this class is to provide an in depth presentation of computer vision techniques for recognition of objects, scenes, materials, actions, ... but by putting them in the framework of concrete tasks.

The class is addressed to students from any discipline, not just vision, interested in learning about computer vision techniques that can be applied to their research. We will cover state of the art object recognition and scene understanding techniques and how they relate to robotics, language, computer graphics, crowd sourcing, human-computer interaction, etc. For students in computer vision, this class will allow exploring new tasks and scene representations, beyond labeling objects in images for the sake of it.

The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, representations for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories, etc. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology and other disciplines.

Course

Wednesdays 1pm-4pm

Room 13-1143 (where is this?)

Instructor: Antonio Torralba, Email: torralba csail mit edu

No prerequisits

Student presentations

Each registered student will have to do a class presentation that will complement one of the lectures. The presentation can be about two papers related with the week lecture topic or a paper of your own research if you feel it can be conected with the class material. Email me with your suggestions about topics you would like to present and I will assign you to one of the lectures.

Class projects

Projects can be done individualy or in groups. The ideal group should be formed by 2 or 3 students from different areas. Important dates:
- Project presentations: December 7th. Each group will have 30 minutes to present their work. All members of the team should participate on the presentation.
- Papers: due on December 14th. Papers should be 4-6 pages long. Each member of the group should submit a separate copy by email (pdf format). The papers among members of the same group can be nearly identical, but each member should state which part of the project did she/he focus on and expand on it. Each paper should include the name of all the team members.

Schedule

Date	Topic	Lecture	Invited speaker	Slides/videos	Links to Papers/code
	Introduction
Sept. 7	Class goals and a short introduction	Antonio		Lecture1 (ppt)	-P. Cavanagh, Vision is getting easier every day, Perception 1996
Sept. 14	Edges, textures, ...	Antonio		Lecture2 (ppt)
Sept. 21	The importance of data	Antonio	Boris Katz Student: Carl Vondrick	Lecture3 (ppt) Boris (ppt) Carl (ppt)	-LabelMe (website, paper.pdf) -Watson (paper.pdf) -START (system website, paper.pdf) -Video annotation -Visipedia
Sept. 28	Object recognition	Antonio	Seth Teller Student: David Hayden	Lecture4 (ppt)	-Felzenszwalb, McAllester and Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR 2008. (code) - Manipulation (paper.pdf) - Natural language commands (paper.pdf)
Oct. 5	Object recognition in context	Antonio	Nicholas Roy Student: Ryan Schoen	Lecture5 (ppt)	- tellex11.pdf - hri10-tk.pdf - icra09-tk.pdf
Oct. 12	Human vision	Antonio	Aude Oliva Student: Deborah Hanus	Lecture6 (ppt)	- A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 2001. (gist code) - S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006. (code)
Oct. 19	Words and pictures	Antonio	Regina Barzilay Student: Yevgeni Berzak	Lecture7 (ppt)	Gestural Cohesion for Discourse Segmentation Jacob Eisenstein, Regina Barzilay, Randall Davis Proceedings of ACL, 2008 Modeling Gesture Salience as a Hidden Variable for Coreference Resolution and Keyframe Extraction Jacob Eisenstein, Regina Barzilay, Randall Davis Journal of Artificial Intelligence Research, 2008 Turning Lectures into Comic Books with Linguistically Salient Gestures Jacob Eisenstein, Regina Barzilay, Randall Davis Proceedings of AAAI, 2007
Oct. 26	Multiclass models and transfer learning	Antonio	Daniela Rus Student: Sudeep Pillai	Lecture8 (ppt)
Nov. 2	No class
Nov. 9	No class	ICCV
Nov. 16	Vision and the brain	Antonio	Jim Di Carlo Students: Ha Hong	Lecture9 (ppt)	Jim Di Carlo's papers
Nov. 23	HCI	Antonio	Students: Mike Fleder Jeremy Scott Yafim Landa	Lecture10 (ppt)
Nov. 30	3D scenes	Antonio	Students: Emily Zhao Xiaodan Jia	Lecture11 (ppt)
	Projects
Dec. 7	Project presentations	Antonio
Dec. 14	Projects	Antonio	Last day of classes