Projects
I was a Ph.D. student at to the
Activity Perception Project group at the
Computer Science and Artificial Intelligence Laboratory (CSAIL) of the
Massachusetts Institute of Technology (MIT).
I previously performed my B.S. and M.S. research in the
Signal Analysis and Machine
Perception Laboratory (SAMPL) at The Ohio State
University (OSU).
I am interested in the intersection of visual tracking, object recognition, and
machine learning. My research projects have ranged from improvements
low-level vision to higher-level tasks such as detection, recognition, and query
systems.
A concise list of publications and recent formal presentations may be found on
my homepage. This page gives a brief description
of various projects I have worked on.
Higher-Level Vision
Many computer vision systems perform high-level tasks such as recognizing
individual people in an image or a video clip or understanding what a group of
people, animals, or vehicles is doing by observing their movements.
Event Detection
In the last decade, significant progress has been made on developing
automatic farfield tracking systems. The PETS 2007 dataset provides some
interesting event detection challenges. We have developed a method of
bootstrapping a background model in the presence of significant lighting
changes. We then use a blob tracker as an attention mechanism for finding
tracks of interest, which we temporally extend using the meanshift
algorithm. Using only weak human and luggage models (based purely on size),
our system performs well at detecting loitering events, and several events
involving interactions between actors and their luggage.
Joint work with Xiaogang Wang and W.E.L. Grimson.
Published and
presented at the
Performance Evaluation of
Tracking Systems (PETS) 2007 Workshop at the International Conference on
Computer Vision (ICCV) and at the
Federal
University of Paraná in Brazil.
Recognizing People by Gait
We developed a model-based method for accurate extraction of pedestrian
silhouettes from video sequences. Our approach is based on two assumptions,
1) there is a common appearance to all pedestrians, and 2) each individual
looks like him/herself over a short amount of time. These assumptions allow
us to learn pedestrian models that encompass both a pedestrian population
appearance and the individual appearance variations. Using our models, we
are able to produce pedestrian silhouettes that have fewer noise pixels and
missing parts. We apply our silhouette extraction approach to the NIST gait
data set and show that under the gait recognition task, our model-based
sulhouettes result in much higher recognition rates than silhouettes
directly extracted from background subtraction, or any non-model-based
smoothing schemes.
Joint work with Lily Lee and Kinh Tieu.
Published at the 2003
International Conference on Computer Vision
conference.
Event Query Systems
In dealing with long-term tracking databases with wide-area coverage, an
important problem is in formulating an intuitive and fast query system for
analysis. In such a query system, a user who is not a computer vision
researcher should be able to readily specify a novel query to the system and
obtain the desired results. Furthermore, these queries should be able to not
only search out individual actors (e.g. "find all white cars") but also find
interactions amongst multiple actors (e.g. "find all drag racing activities
in the city"). Informally, we have found that people often use sketches when
describing activities and interactions. We have demonstrated a
preliminary system that automatically interprets schematic drawings of
activities. The system transforms the schematics into executable code that
searches a tracking database. Through our query optimization, these queries
tend to take orders of magnitude less time to execute than equivalent
queries running on a partially-optimized SQL database.
Joint work with Tomáš Ižo. Technical report:
MIT-CSAIL-TR-2006-043.
Vehicle Recognition from 3D Point Clouds
We developed a method for automatic target recognition using an airborne
laser range finder for targets hiding under tree canopies. Our system
attempts to infer local, low-order, relatively at surfaces and then
reconstruct a mesh model of the object to be identified. We then segment
this mesh using a spectral clustering algorithm to find larger shapes that
make up the object, and we recognize the target using a partial graph
matcher. Given the noise and clutter levels, we demonstrate promising
results, with two of our objects achieving 100% recognition rates.
Masters Thesis work at OSU.
Low-Level Vision
In order to perform high-level vision tasks, one must start with the pixels.
Low-level vision techniques focus on extracting useful information from the raw
data.
Background Subtraction
In the traditional mixture of Gaussians (MoG) background model, the
generating process of each pixel is modeled as a mixture of Gaussians over
color. Unfortunately, this model performs poorly when the background
consists of dynamic textures such as trees waving in the wind and rippling
water. To address this deficiency, researchers have recently looked to more
complex and/or less compact representations of the background process. We
propose a generalization of the MoG model that handles dynamic textures. In
the context of background modeling, we achieve better, more accurate
segmentations than the competing methods, using a model whose complexity
grows with the underlying complexity of the scene (as any good model
should), rather than the amount of time required to observe all aspects of
the texture.
Joint work with Joshua Migdal and W. Eric L. Grimson.
Published and
presented at the 2008
Workshop on Applications of Computer Vision (WACV).
Calibration
Organizations responsible for securing large and critical physical assets
are deploying large camera networks for monitoring. In many deployment
scenarios, it is infeasible to perform careful per-camera calibration.
Installation personnel may lack the technical expertise and/or training to
adequately perform the calibration, the installations may be in hazardous
areas where it is advantageous to minimize the time spent during
installation, and in real-world scenarios, cameras tend to be bumped and
moved over time, requiring periodic recalibration. For very large
networks, it becomes prohibitively expensive to manually calibrate all
cameras. We have developed a technique for automatically learning the
positions of cameras by merging data from Global Position System (GPS)
receivers with camera detections. Our system does not require explicit
manual correspondence of the GPS data with the detection results. Our
system is also able to learn the topological structure of the camera
network.
Joint work with Kinh Tieu and W.E.L. Grimson.
Published and
presented at the 2005
International Conference on Computer Vision.
Super Resolution
We address the problem of text super-resolution: given a single image of
text scanned in at low resolution from a piece of paper, return the image
that is mostly likely to be generated from a noiseless high-resolution scan
of the same piece of paper. In doing so, we wish to: (1) avoid introducing
artifacts in the high-resolution image such as blurry edges and rounded
corners, (2) recover from quantization noise and grid-alignment effects that
introduce errors in the low-resolution image, and (3) handle documents with
very large glyph sets such as Japanese's Kanji. Applications for this
technology include improving the display of: fax documents, low-resolution
scans of archival documents, and low-resolution bitmapped fonts on
high-resolution output devices.
Joint work with Bill Freeman and Joe Marks.
Published at the 2004 International
Conference on Image Processing.
Range Image Registration
We performed a robustness study on several popular techniques for
performing fine registration of partially overlapping 2.5D range image
pairs, with a focus on model building. In our first set of tests, we
qualitatively evaluated the output of several iterative closest point (ICP)
variants on real-world data. Our second set of tests expanded to include
additional ICP variants and an implementation of Chen and Medioni's
point-to-plane minimizing algorithm. These tests evaluated quantitatively
how well these algorithm variants are able to correct initial simulated
rigid rotation and translation errors. The aim of these variants in both
sets of tests was to classify as outliers particular point pairs containing
vertices outside of the region of overlap of the two range images. In
addition to testing these variants with different parameter settings, we
also studied how performing topologically uniform subsampling of the meshes
affects the registration quality.
Joint work with Patrick Flynn while at OSU. Published in the
Computer Vision and Image Understanding
journal and at the 2001
International Conference on 3D Imaging and Modeling.
|