Projects

I was a Ph.D. student at to the Activity Perception Project group at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT). I previously performed my B.S. and M.S. research in the Signal Analysis and Machine Perception Laboratory (SAMPL) at The Ohio State University (OSU).

I am interested in the intersection of visual tracking, object recognition, and machine learning. My research projects have ranged from improvements low-level vision to higher-level tasks such as detection, recognition, and query systems.

A concise list of publications and recent formal presentations may be found on my homepage. This page gives a brief description of various projects I have worked on.

Higher-Level Vision

Many computer vision systems perform high-level tasks such as recognizing individual people in an image or a video clip or understanding what a group of people, animals, or vehicles is doing by observing their movements.

Event Detection

In the last decade, significant progress has been made on developing automatic farfield tracking systems. The PETS 2007 dataset provides some interesting event detection challenges. We have developed a method of bootstrapping a background model in the presence of significant lighting changes. We then use a blob tracker as an attention mechanism for finding tracks of interest, which we temporally extend using the meanshift algorithm. Using only weak human and luggage models (based purely on size), our system performs well at detecting loitering events, and several events involving interactions between actors and their luggage.

Joint work with Xiaogang Wang and W.E.L. Grimson. Published and presented at the Performance Evaluation of Tracking Systems (PETS) 2007 Workshop at the International Conference on Computer Vision (ICCV) and at the Federal University of Paraná in Brazil.

Recognizing People by Gait

We developed a model-based method for accurate extraction of pedestrian silhouettes from video sequences. Our approach is based on two assumptions, 1) there is a common appearance to all pedestrians, and 2) each individual looks like him/herself over a short amount of time. These assumptions allow us to learn pedestrian models that encompass both a pedestrian population appearance and the individual appearance variations. Using our models, we are able to produce pedestrian silhouettes that have fewer noise pixels and missing parts. We apply our silhouette extraction approach to the NIST gait data set and show that under the gait recognition task, our model-based sulhouettes result in much higher recognition rates than silhouettes directly extracted from background subtraction, or any non-model-based smoothing schemes.

Joint work with Lily Lee and Kinh Tieu. Published at the 2003 International Conference on Computer Vision conference.

Event Query Systems

In dealing with long-term tracking databases with wide-area coverage, an important problem is in formulating an intuitive and fast query system for analysis. In such a query system, a user who is not a computer vision researcher should be able to readily specify a novel query to the system and obtain the desired results. Furthermore, these queries should be able to not only search out individual actors (e.g. "find all white cars") but also find interactions amongst multiple actors (e.g. "find all drag racing activities in the city"). Informally, we have found that people often use sketches when describing activities and interactions. We have demonstrated a preliminary system that automatically interprets schematic drawings of activities. The system transforms the schematics into executable code that searches a tracking database. Through our query optimization, these queries tend to take orders of magnitude less time to execute than equivalent queries running on a partially-optimized SQL database.

Joint work with Tomáš Ižo. Technical report: MIT-CSAIL-TR-2006-043.

Vehicle Recognition from 3D Point Clouds

We developed a method for automatic target recognition using an airborne laser range finder for targets hiding under tree canopies. Our system attempts to infer local, low-order, relatively at surfaces and then reconstruct a mesh model of the object to be identified. We then segment this mesh using a spectral clustering algorithm to find larger shapes that make up the object, and we recognize the target using a partial graph matcher. Given the noise and clutter levels, we demonstrate promising results, with two of our objects achieving 100% recognition rates.

Masters Thesis work at OSU.

Low-Level Vision

In order to perform high-level vision tasks, one must start with the pixels. Low-level vision techniques focus on extracting useful information from the raw data.

Background Subtraction

In the traditional mixture of Gaussians (MoG) background model, the generating process of each pixel is modeled as a mixture of Gaussians over color. Unfortunately, this model performs poorly when the background consists of dynamic textures such as trees waving in the wind and rippling water. To address this deficiency, researchers have recently looked to more complex and/or less compact representations of the background process. We propose a generalization of the MoG model that handles dynamic textures. In the context of background modeling, we achieve better, more accurate segmentations than the competing methods, using a model whose complexity grows with the underlying complexity of the scene (as any good model should), rather than the amount of time required to observe all aspects of the texture.

Joint work with Joshua Migdal and W. Eric L. Grimson. Published and presented at the 2008 Workshop on Applications of Computer Vision (WACV).

Calibration

Organizations responsible for securing large and critical physical assets are deploying large camera networks for monitoring. In many deployment scenarios, it is infeasible to perform careful per-camera calibration. Installation personnel may lack the technical expertise and/or training to adequately perform the calibration, the installations may be in hazardous areas where it is advantageous to minimize the time spent during installation, and in real-world scenarios, cameras tend to be bumped and moved over time, requiring periodic recalibration. For very large networks, it becomes prohibitively expensive to manually calibrate all cameras. We have developed a technique for automatically learning the positions of cameras by merging data from Global Position System (GPS) receivers with camera detections. Our system does not require explicit manual correspondence of the GPS data with the detection results. Our system is also able to learn the topological structure of the camera network.

Joint work with Kinh Tieu and W.E.L. Grimson. Published and presented at the 2005 International Conference on Computer Vision.

Super Resolution

We address the problem of text super-resolution: given a single image of text scanned in at low resolution from a piece of paper, return the image that is mostly likely to be generated from a noiseless high-resolution scan of the same piece of paper. In doing so, we wish to: (1) avoid introducing artifacts in the high-resolution image such as blurry edges and rounded corners, (2) recover from quantization noise and grid-alignment effects that introduce errors in the low-resolution image, and (3) handle documents with very large glyph sets such as Japanese's Kanji. Applications for this technology include improving the display of: fax documents, low-resolution scans of archival documents, and low-resolution bitmapped fonts on high-resolution output devices.

Joint work with Bill Freeman and Joe Marks. Published at the 2004 International Conference on Image Processing.

Range Image Registration

We performed a robustness study on several popular techniques for performing fine registration of partially overlapping 2.5D range image pairs, with a focus on model building. In our first set of tests, we qualitatively evaluated the output of several iterative closest point (ICP) variants on real-world data. Our second set of tests expanded to include additional ICP variants and an implementation of Chen and Medioni's point-to-plane minimizing algorithm. These tests evaluated quantitatively how well these algorithm variants are able to correct initial simulated rigid rotation and translation errors. The aim of these variants in both sets of tests was to classify as outliers particular point pairs containing vertices outside of the region of overlap of the two range images. In addition to testing these variants with different parameter settings, we also studied how performing topologically uniform subsampling of the meshes affects the registration quality.

Joint work with Patrick Flynn while at OSU. Published in the Computer Vision and Image Understanding journal and at the 2001 International Conference on 3D Imaging and Modeling.

Last updated: 2008-01-25 09:23:57 -0500