I work worked (more recent work) in the Humanoid Robotics Group at MIT CSAIL (merger of MIT AI Lab and MIT LCS) with Prof. Rodney Brooks along with many other great people. This page summarizes a few of my research interests.

my crazy advisor

Humanoid Robots
I'm interested in building adaptive, general-purpose robots, to try and understand what might underlie our own adaptivity. I work on Cog, Kismet (now retired and moved to the MIT museum), and Cardea. Kismet is an expressive anthropomorphic head useful for human interaction work (see videos); Cog is an upper torso humanoid more adept at object interaction. (See videos, my thesis). Cardea is a more recent robot with many crazy features (see some pictures).

Kismet and Cog

Active Segmentation
One of the things Cog can do is learn about objects by poking them and seeing what happens. If can identify the boundaries of objects by pushing them a little and see what parts move together (the object) and what stays still (the background). This is called active segmentation. Once Cog can reliably segment objects, then it can learn about their appearance and how they move. (A paper on this topic, also see thesis chapter 3).

Active segmentation in action. Cog resolves visual ambiguity by poking around and seeing what happens.

Video:  Active segmentation
Cog taps a toy with his flipper and uses the motion to segment the object from the background.
   Quicktime -- (2.0 MB)
   Length: approximately 20 seconds

Open Object Recognition
Cog can learn to recognize new objects by interacting with them. Open Object Recognition is the ability to recognize a flexible set of objects, where new objects can be introduced at any time. Conventional object recognition systems do not need to be open - for example, the set of objects an industrial robot needs to interact with is likely to be fixed. But a humanoid robot in an unconstrained environment could be presented with just about anything, so it needs to have an open object recognition system. The alternative would be to enumerate and train for all the possible objects the robot might encounter, which is not practical. (A paper on this topic, also see thesis chapter 5).

Cog pokes objects (A), learns their appearance (B), and then recognizes them if it sees them again (C).

Video:  Open object recognition
In this video, Cog is confronted with a new object (a red ball). Since it has limited experience at this point, it confuses the ball with another object (a cube) which has similar color. Once Cog pokes the object, after about 5 seconds processing it can correctly distinguish between these objects based on shape.
   Quicktime -- (6.7 MB)
   Length: approximately 1 minute

Role Transfer
Role transfer is a mechanism for incremently passing control for more and more of a task over to a robot. The instructor demonstrates the task while providing verbal annotation. The vocal stream is used to construct a model of the task. Generic machine learning methods are then used to ground this model in the robot's perceptual network, guided by feature selection input from the human. The idea is to avoid ever presenting the robot with a hard learning problem; the learning algorithms are intended to be decoders allowing the human to communicate changes in representation, rather than to learn in the conventional sense. (A paper on this topic, also see thesis chapter 10).

Video:  Role transfer
Role transfer. Kismet watches human sorting green objects to one side and yellow objects to other. When prompted, robot says the direction it expects a new object to go.
   Quicktime -- (1.5 MB)
   Length: approximately 1.5 minutes

Video:  Learning through activity
In this video, the operator tells Cog they are going to "find" a "toma". Then they look at several objects in turn, saying "no" to each. Finally the operator shows Cog a bottle and says "yes". Cog then associates the name "toma" with the bottle, by analogy with previous search episodes.
   Quicktime -- (23.5 MB)
   Length: 1.5 minutes

Visual Learning
Many classical vision problems can be addressed in an empirical manner on a robot -- instead of building complete models, it suffices to build partial models, and behaviors that can collect the training data necessary to complete those models. I have used this approach for orientation detection, affordance characterization (poster, paper), and object recognition. (See thesis chapters 4, 7, and 5 respectively).

I collaborate with Charlie Kemp on Platform Shoe, a project to find innovative camera placements for wearable computing. Shoes are a good spot since, during walking, they are pressed quite firmly against the ground, giving a relatively stable view of the environment.

"Platform Shoe" - (more information)

Multimodal Recognition
I collaborate with Artur Arsenio on multimodal recognition, an effort to recognize periodically moving objects by the relationship between their visual trajectories and the sound they generate.

More information
See my publications or presentations for details of all these topics. There are also assorted videos available.

my thesis, she is done!

homepage  -  of  -  Paul  -  Fitzpatrick