This work is a collaboration
between Artur Arsenio and Paul Fitzpatrick.
Tools are often used in a manner that is composed
of some repeated motion -- consider hammers, saws,
brushes, files, etc. This repetition can potentially aid
a robot to perceive these objects robustly. Our
approach is for the robot to detect simple repeated events
at frequencies relevant for human interaction, using
both visual and acoustic perception. The advantage
of combining rhythmic information across these two
modalities is that they have complementary properties.
Since sound waves disperse more readily than
light, vision retains more spatial structure -- but for
the same reason it is sensitive to occlusion and the
relative angle of the robot's sensors, while auditory
perception is quite robust to these factors.
The relationship between object motion and
the sound generated varies in an object-specific way.
The hammer causes sound when changing direction
after striking an object. The bell typically causes sound
at either extreme of motion. A toy truck causes sound
while moving rapidly with wheels spinning; it is quiet
when changing direction.
|
The spatial trajectory of a moving object can be recovered
quite straightforwardly from visual analysis, but not
from sound. However, the trajectory in itself is not
very revealing about the nature of the object. We use
the trajectory to extract visual and acoustic features
-- patches of pixels, and sound frequency bands -- that
are likely to be associated with the object. Both can
be used for recognition. Sound features are easier to
use since they are relatively insensitive to spatial
parameters such as the relative position and pose of the
object and the robot.
|