When it comes to building image recognition systems, classifiers,
trackers, spam filters, or anything else intelligent, we can choose
from two extremes:
If the cost of spending one week designing a better model and inferene
technique is the same as the cost spending one hour collecting data,
then hybrids between these two techniques don't save us any additional
effort.
Unsupervised and semisupervised techniques appear to totally break this trend because they consume training data that is typically very easy to get.
Our approach for building semisupervised algorithm is general: we regularize the output of standard regression techniques with a small amount of domain knowledge. This automatically allows the training algorithm to take advantage of unlabeled data. Using this methodology, we built an algorithm that learns to transform time series from a few examples.
Semisupervised learning lets us solve some very difficult tracking
problem. We can learn visual trackers using lots of unlabeled images
that have
Here is an example input video of Ben flailing his hands. For a few frames, we specified the position of his joints. From these examples and 2 minutes of unlabeled frames, the algorithm learned a function that took as input an image, and returned the position of Ben's shoulders, elbows, and hands. This video compares simple regression (black marketers) with semi-supervised regression (white markers).
| Specified examples | ||||||||||||
| ||||||||||||
|
We have also applied the algorithm to the problem of tracking lips. The algorithm learns a mapping from images of my face to a spline contour. See the video(1.1Mb) for an example. To train the system, we used the 7 keyframes shown above. The technique is general. Below, the system learns how to track RFID tags from signal strengths harvested from antennas. Four examples provide the mapping from signal strength to position. In addition, 5 minutes of unlabeled signal strength data is also provided.
The following publications describe this work in more detail.
Unsupervised Learning and Nonlinear System IdentificationSometimes, we don't even need labeled examples! In many settings, to learn to track, it suffices to know that the target moves smoothly over time, and that the sensor report a smooth function of the target's state. Unlabeled data provides almost all the rest of the information (we can learn to recover the pose of the target up to a coordinate transform). The sensetable probelm above for example, can be solved without any labeled examples whatsoever. This phenomenon is well known in a simpler form in the control theory literature, where linear system identification has been used for decades to learn the parameters of linear dynamical systems. In tracking, however, the relationship between pose and measurements is nonlinear. So we have been tackling the problem of nonlinear system identification, a notoriously difficult unsupervised learning problem due to the local minima problem. In this setting, the problem is more closely related to manifold learning. We have devised algorithms that can provably avoid these local minima in certain situations (the observation function must be observable and smooth, and the dynamics must be known apriori). This work is described here: Interpretations of Normalized CutsIn these papers, we provide a geometric interpretation for the popular Normalized Cuts clustering algorithm. We show that Normalized Cuts lifts the data to an infinite-dimensional feature space, and splits it into two clusters using a hyperplane. We also show that this procedure is a relaxation of the Transductive SVMs problem, which is NP-hard. ![]() |