Machine Understanding of Narrated Guided Tours

Albert Huang <albert at csail dot mit dot edu>
Seth Teller <teller at csail dot mit dot edu>

Motivation

Suppose Alice is a new student or employee. A typical way to introduce Alice to her new environment is for someone to walk around with her, describing the environment with cues such as "Here's your desk, this is the bathroom, this is the kitchen, etc." Being able to introduce a robot or computational device to a physical environment in this same manner provides significant benefits. In addition to lowering the initial cost and effort involved in integrating such a device into the environment, people could interact with the device intuitively (e.g. "Bring this to Bob's desk" instead of "Bring this to [32.533, 19.89, 43.278]"). Updating the device's internal representation of the space could be simplified from uploading new firmware or specially formatted maps to walking around with it and giving it verbal commands.

Approach

To explore this possibility, we have constructed a sensory platform in the form of a wearable backpack with a number of different sensors:

The device is called the Ladypack, named after the most prominent sensor and the overall form factor. Data is collected by walking around while wearing the Ladypack, speaking to it with the microphone, and indicating items or areas of interest with the laser range finder.

For a better description of the Ladypack, take a couple minutes to watch the first video linked below.

Videos

note: These videos are all encoded with XViD and MS MPEG4v2. MS MPEG4v2 videos should play without issue on a default Windows 2000/XP installation (but not other operating systems), and XViD videos will play on any machine with XViD codecs installed. (win32 XViD codecs available here) Both versions are linked after each video title.

Links