Grounding Object Recognition
and Scene Understanding
This class will cover current approaches to object recognition and scene understanding in computer vision and its relation to other disciplines. The goal of this class is to provide an in depth presentation of computer vision techniques for recognition of objects, scenes, materials, actions, ... but by putting them in the framework of concrete tasks.
The class is addressed to students from any discipline, not just vision, interested in learning about computer vision techniques that can be applied to their research. We will cover state of the art object recognition and scene understanding techniques and how they relate to robotics, language, computer graphics, crowd sourcing, human-computer interaction, etc. For students in computer vision, this class will allow exploring new tasks and scene representations, beyond labeling objects in images for the sake of it.
The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, representations for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories, etc. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology and other disciplines.
Room 13-1143 (where is this?)
Instructor: Antonio Torralba, Email: torralba csail mit edu
Each registered student will have to do a class presentation that will complement one of the lectures. The presentation can be about two papers related with the week lecture topic or a paper of your own research if you feel it can be conected with the class material. Email me with your suggestions about topics you would like to present and I will assign you to one of the lectures.
Projects can be done individualy or in groups. The ideal group should be formed by 2 or 3 students from different areas. Important dates:
Here there are links to useful code for low-level and mid-level vision tasks:
Other useful code: