|
Trevor Darrell
UC
Berkeley EECS / ICSI / MIT CSAIL

|

|
Prof. Darrell
is moving from MIT EECS and CSAIL to UC Berkeley EECS and ICSI. He joined the
faculty of UCB EECS and started a new computer vision group at ICSI in January
2008. He will remain on the faculty at MIT EECS through Spring 2008 and will
physically move from Cambridge to Berkeley in Summer 2008, but will continue to
advise students at MIT through 2009. Accepted UC
Berkeley graduate students interested in working with Prof. Darrell at Berkeley
should contact him directly. The previously announced postdoctoral positions for 2008-2009 have been filled.
Recent activity:
Link to MIT CSAIL Vision Interface Group Web Site
About my
research:
My group develops algorithms and
systems for perceptive interfaces, which enable users to interact with machines
using natural expression and gesture and also allow machines to understand a
users' physical environment. We develop computer vision algorithms to support
two very useful forms of interaction: first, enabling machines to interact with
people through multimodal conversation, and second, allowing devices to recognize
objects of interest to a user and provide situated search for information about
those objects. Enabling machines to understand multimodal communication and
reference is extremely valuable in many application areas.
Our projects can be clustered roughly
into three technical topics:
- multimodal stream processing
- estimation of human body pose and recognition
of body gesture, and
- indexing and recognition of scenes, objects,
and object categories.
We have made recent progress on new models
for audiovisual speech detection and recognition. Our ICCV2005 paper describes a new model for visual speech recognition,
based on a Dynamic Bayesian Network model of loosely coupled streams of
articulatory features. In contrast to conventional phoneme/viseme models,
our approach captures the asynchrony present in multimodal signals.
Visual speech analysis is useful for determining whether the user is actually
speaking (i.e., audiovisual endpointing) as well as improving recognition of
acoustically ambiguous or noisy signals.
We have developed radical new methods for
tracking human body pose and recognizing gesture. In contrast to most
previous estimation approaches, which were based on a tracking paradigm, our ICCV2003 paper pioneered techniques for single-frame pose estimation
using a novel approximate nearest neighbor algorithm. Our technique
learned how to find approximate nearest neighbors in a parameter space
of interest, and index very large datasets in sublinear time. These topics and
other related methods are covered in a recent book from the MIT Press. In ICCV2005 and CVPR2006 papers, we have expanded our single frame technique into an
optimal temporal tracking formulism, which offers the robustness of broadly
searching the parameter space and the accuracy of integration over time with an
appropriate dynamics model. In a second CVPR2006 paper, we have shown how gesture recognition can be
performed using Hidden-state Conditional Random Fields; a new model that can
learn and re-use discriminative sub-gesture structure. We have also shown
in an award winning ICMI2005 paper that dialog context is crucial for effective gesture
recognition; an expanded version of this work was presented at the AAAI2006 Nectar track.
We believe mobile devices should be able
to quickly recognize their environment and common object categories, be able to
search the web with visual cues to find information about newly-encountered
objects, and use image content to assist a user in providing metadata labels
for their captured media. Our Pyramid Match Kernel (ICCV2005, CVPR2006, NIPS2006) provides efficient means to help
solve these problems; in recent object category recognition benchmarks it was
faster than competing methods by an order of magnitude or more. Our
prototype systems for mobile image search are described in CHI2005 and CVPR2004.
Additional information on these projects,
and information on additional projects not mentioned above are available on my
group's projects, demos, and publications pages, and in the slides linked
below.
Many of our techniques have been
integrated into a deployable multimodal interface component for SRI's CALO system, part of the DARPA PAL
program; our perceptive laptop interface was demoed at MLMI 2006; see the Demo Handout for an illustration.
Graduated Ph.D. Students:
Louis-Philippe Morency, Dialogue
Context and Visual Gesture Recognition, Oct 2006 [Research Scientist, USC]
Kristen Grauman, Matching sets of features for
efficient retrieval and recognition, Aug. 2006 [Assistant Professor, CS, University of Texas,
Austin]
Leonid Taycher,
Statistical methods for dynamic visual processing, Aug 2006 [Google Boston]
Kevin Wilson, Learning Uncertainty Models for
Audiovisual Speech Source Localization in Real-World Environments, Aug 2006
[Research Scientist, Mitsubishi Electric Research Labs, Cambridge (MERL)]
Gregory Shakhnarovich, Learning Features for Visual
Classification, Oct. 2005 [Assistant Professor, TTI-C]
Ali Rahimi, Learning to
Transform Time Series with a Few Examples, Oct. 2005 [Intel Research
Laboratories, Seattle]
Current Students:
Tom Yeh
Mario
Christoudas
Kate Saenko
Ariadna Quattoni
Sy Bor Wang
John Lee
About me:
Classes:
Spring 2007: 6.870: Intelligent Multimodal Interfaces
Spring 2006: 6.001
Lectures
Fall 2005: 6.897 Object Recognition Seminar
Spring 2005: 6.001
Lectures
Fall 2004: 6.001
Recitation
Spring
2004: 6.891
Computer Vision and Applications
Fall 2002:
6.801/6.866 Computer Vision
Spring 2002: 6.001
Recitation
Fall
2001: (6.892) Computer Vision for Interface and Surveillance:
Algorithms and Implications
Spring 2001: 6.001
Recitation
Fall
2000: (6.892) Computer Vision for Interface and Surveillance:
Algorithms and Implications
Spring
2000: Vision Interface Reading Group.
Spring
2000: 6.001
Recitation
(Stanford) Spring
1997:
CS377B; Machine Perception for Human-Computer Interface, under Terry Winograd's PCD program in the Department of Computer
Science.
Meetings Organized:
Spring 2007 - ISAT
study on Exploitation of Persistent Operational Surveillance (Chair)
Spring 2006 - ISAT
quick reaction study on Adaptive and Interactive Representations (Co-Chair)
Fall 2004 - ICMI 2004 (General Chair)
Fall 2003 - NIPS Workshop on Approximate Nearest Neighbors Methods for
Learning and Vision
Fall 2003 - ICMI 2003 (Program Chair)
Fall 2001 - NIPS Workshop on Multi-Sensory Perceptive Systems
Fall 2001 - PUI 2001 (Program Chair)
Fall
98 / Spring 99 - Interval Research Signal Computation Seminar
May
98 - Third
Bay Area Vision Meeting (BAVM), on the topic Visual Analysis
of People.
MIT Contact Info:
|
Trevor
Darrell
MIT CSAIL
Associate
Professor, EECS Dept.
32 Vassar Street Rm. 32-D512
(617)
253-8966 (office)
trevor@csail.mit.edu
|

|