Trevor Darrell

UC Berkeley EECS / ICSI / MIT CSAIL


Prof. Darrell is moving from MIT EECS and CSAIL to UC Berkeley EECS and ICSI. He joined the faculty of UCB EECS and started a new computer vision group at ICSI in January 2008. He will remain on the faculty at MIT EECS through Spring 2008 and will physically move from Cambridge to Berkeley in Summer 2008, but will continue to advise students at MIT through 2009. Accepted UC Berkeley graduate students interested in working with Prof. Darrell at Berkeley should contact him directly. The previously announced postdoctoral positions for 2008-2009 have been filled.


Recent activity:

 

Link to MIT CSAIL Vision Interface Group Web Site

 


 

About my research:

 

My group develops algorithms and systems for perceptive interfaces, which enable users to interact with machines using natural expression and gesture and also allow machines to understand a users' physical environment. We develop computer vision algorithms to support two very useful forms of interaction: first, enabling machines to interact with people through multimodal conversation, and second, allowing devices to recognize objects of interest to a user and provide situated search for information about those objects. Enabling machines to understand multimodal communication and reference is extremely valuable in many application areas.

 

Our projects can be clustered roughly into three technical topics:

 

  • multimodal stream processing
  • estimation of human body pose and recognition of body gesture, and
  • indexing and recognition of scenes, objects, and object categories.

We have made recent progress on new models for audiovisual speech detection and recognition.  Our ICCV2005  paper describes a new model for visual speech recognition, based on a Dynamic Bayesian Network model of loosely coupled streams of articulatory features.  In contrast to conventional phoneme/viseme models, our approach captures the asynchrony present in multimodal signals.  Visual speech analysis is useful for determining whether the user is actually speaking (i.e., audiovisual endpointing) as well as improving recognition of acoustically ambiguous or noisy signals.

We have developed radical new methods for tracking human body pose and recognizing gesture.  In contrast to most previous estimation approaches, which were based on a tracking paradigm, our ICCV2003  paper pioneered techniques for single-frame pose estimation using a novel approximate nearest neighbor algorithm.  Our technique learned how to find approximate nearest neighbors in a parameter space of interest, and index very large datasets in sublinear time. These topics and other related methods are covered in a recent book from the MIT Press.   In ICCV2005  and CVPR2006  papers, we have expanded our single frame technique into an optimal temporal tracking formulism, which offers the robustness of broadly searching the parameter space and the accuracy of integration over time with an appropriate dynamics model.  In a second CVPR2006  paper, we have shown how gesture recognition can be performed using Hidden-state Conditional Random Fields; a new model that can learn and re-use discriminative sub-gesture structure.  We have also shown in an award winning ICMI2005  paper that dialog context is crucial for effective gesture recognition; an expanded version of this work was presented at the AAAI2006 Nectar track.

We believe mobile devices should be able to quickly recognize their environment and common object categories, be able to search the web with visual cues to find information about newly-encountered objects, and use image content to assist a user in providing metadata labels for their captured media.  Our Pyramid Match Kernel (ICCV2005, CVPR2006, NIPS2006) provides efficient means to help solve these problems; in recent object category recognition benchmarks it was faster than competing methods by an order of magnitude or more.  Our prototype systems for mobile image search are described in CHI2005  and CVPR2004.

Additional information on these projects, and information on additional projects not mentioned above are available on my group's projects, demos, and publications pages, and in the slides linked below.

Vision Interface Group Projects  

Vision Interface Group Publications

Slides from Recent Colloquia Presentation

Many of our techniques have been integrated into a deployable multimodal interface component for SRI's CALO system, part of the DARPA PAL program; our perceptive laptop interface was demoed at MLMI 2006; see the Demo Handout  for an illustration.


Graduated Ph.D. Students:

Louis-Philippe Morency, Dialogue Context and Visual Gesture Recognition, Oct 2006 [Research Scientist, USC]

Kristen Grauman, Matching sets of features for efficient retrieval and recognition, Aug. 2006 [Assistant Professor, CS, University of Texas, Austin]

Leonid Taycher, Statistical methods for dynamic visual processing, Aug 2006 [Google Boston]

Kevin Wilson, Learning Uncertainty Models for Audiovisual Speech Source Localization in Real-World Environments, Aug 2006 [Research Scientist, Mitsubishi Electric Research Labs, Cambridge (MERL)]

Gregory Shakhnarovich, Learning Features for Visual Classification, Oct. 2005 [Assistant Professor, TTI-C]

Ali Rahimi, Learning to Transform Time Series with a Few Examples, Oct. 2005 [Intel Research Laboratories, Seattle]

Current Students:

Tom Yeh

Mario Christoudas

Kate Saenko

Ariadna Quattoni

Sy Bor Wang

John Lee


About me:

Recent Curriculum Vitae  

Previous Publications (1987-2001)  

Previous Projects (1987-2001)


Classes:

Spring 2007: 6.870: Intelligent Multimodal Interfaces

Spring 2006: 6.001 Lectures

Fall 2005: 6.897 Object Recognition Seminar

Spring 2005: 6.001 Lectures

Fall 2004: 6.001 Recitation

Spring 2004: 6.891 Computer Vision and Applications

Fall 2002: 6.801/6.866 Computer Vision

Spring 2002: 6.001 Recitation

Fall 2001: (6.892) Computer Vision for Interface and Surveillance: Algorithms and Implications

Spring 2001: 6.001 Recitation

Fall 2000:  (6.892) Computer Vision for Interface and Surveillance: Algorithms and Implications

Spring 2000: Vision Interface Reading Group.

Spring 2000: 6.001 Recitation

(Stanford) Spring 1997: CS377B; Machine Perception for Human-Computer Interface, under Terry Winograd's PCD program in the Department of Computer Science.

 


Meetings Organized:

Spring 2007 - ISAT study on Exploitation of Persistent Operational Surveillance (Chair)

Spring 2006 - ISAT quick reaction study on Adaptive and Interactive Representations (Co-Chair)

Fall 2004 - ICMI 2004 (General Chair)

Fall 2003 - NIPS Workshop on Approximate Nearest Neighbors Methods for Learning and Vision

Fall 2003 - ICMI 2003 (Program Chair)

Fall 2001 - NIPS Workshop on Multi-Sensory Perceptive Systems

Fall 2001 - PUI 2001 (Program Chair)

Fall 98 / Spring 99 - Interval Research Signal Computation Seminar

May 98 - Third Bay Area Vision Meeting (BAVM),  on the topic Visual Analysis of People.


MIT Contact Info:

Trevor Darrell
MIT CSAIL
Associate Professor, EECS Dept.
32 Vassar Street Rm. 32-D512
(617) 253-8966 (office)
trevor@csail.mit.edu