Towards Manipulation-Driven Vision
Paul M. Fitzpatrick
and Giorgio Metta
,
MIT AI Lab Massachussetts Institute of Technology USA
Lira Lab, DIST University of Genova Italy
Abstract
For the purposes of manipulation, we would like to
know what parts of the environment are physically
coherent ensembles that is, which parts will move
together, and which are more or less independent. It
takes a great deal of experience before this judgement
can be made from purely visual information. This
paper develops active strategies for acquiring that ex-
perience through experimental manipulation, using
tight correlations between arm motion and optic flow
to detect both the arm itself and the boundaries of
objects with which it comes into contact.
1
The elusive object
Sensory information is intrinsically ambiguous, and
very distant from the world of well-defined objects
in which humans believe they live. What criterion
should be applied to distinguish one object from
another? How can perception support such a phe-
nomenon as figure-ground segmentation? Consider
the example in Figure 1. It is immediately clear that
the drawing on the left is a cross, perhaps because
we already have a criterion, which allows segmenting
on the basis of the intensity difference. It is slightly
less clear that the zeros and ones on the middle panel
are still a cross. What can we say about the array
on the right? If we are not told, and we do not have
the criterion to perform the figure-ground segmenta-
tion, we might think this is just a random collection
of numbers. But if we are told that the criterion is
"prime numbers vs. non-prime" then a cross can still
be identified.
While we have to be inventive to come up with a
segmentation problem that tests a human, we don't
have to go far at all to find something that baffles our
robots. Figure 2 shows a robot's-eye view of a cube
sitting on a table. Simple enough, but many rules
of thumb used in segmentation fail in this particular
case. And even an experienced human observer, di-
agnosing the cube as a separate object based on its
shadow and subtle differences in the surface texture
of the cube and table, could in fact be mistaken
0
0
0
1
0
0
0
0
1
0
1
1
1
1
1
0
0
0
1
0
0
0
0
1
0
4
27
8
5
9
12
46
18
23
21
17
31
7
11
3
4
32
42
37
10
15
50
6
13
25
a cross
a binary cross
?
Figure 1: Three examples of crosses, following [12].
The human ability to segment objects is not general-
purpose, and improves with experience.
Figure 2: A cube on a table. The edges of the table
and cube happen to be aligned (dashed line), the col-
ors of the cube and table are not well separated, and
the cube has a potentially confusing surface pattern.
perhaps some malicious researcher is up to mischief.
The only way to find out for sure is to take action,
and start poking and prodding. As early as 1734,
Berkeley observed that:
...objects can only be known by touch. Vi-
sion is subject to illusions, which arise from
the distance-size problem... [2]
In this paper, we provide support for a more nuanced
proposition: that in the presence of touch, vision be-
comes more powerful, and many of its illusions fade
away.
Objects and actions.
The example of the cross
composed of prime numbers is a novel (albeit un-