Interval Research Signal Computation Talk Series, Fall 1998




All talks are at 11am at Interval Research in the C104/"PageMill" conference room unless otherwise noted. Driving directions to Interval are available at http://www.interval.com/frameset.cgi?about/come/index.html.. The talk series website is http://www.interval.com/~trevor/sigcomp-fall98.html.

Please email trevor@interval.com if you plan to attend so we can have a visitors badge prepared for you when you arrive.

To receive Sigcomp talk announcements via email, you can subscribe to our mailing list by sending an email message to majordomo@interval.com with "subscribe signals-talk" in the body of the message.




Click here for the Spring 1999 seminar page.





Abstracts:

Wednesday October 7th
Bradley Horowitz
Virage, Inc

The Virage Video Search Engine

Bradley Horowitz, CTO of Virage, Inc., will discuss and demonstrate the Virage Video Cataloger 2.0.  Emerging streaming video systems (RealVideo, Netshow, etc.) provide the plumbing to move video around the net, but the end-user experience provided by these systems offers few advantages over traditional broadcast delivery.  The Virage Video Cataloger "watches", "reads", and "listens" to a video stream, providing a rich set of metadata handles into this otherwise opaque data source. The result is that video becomes a navigable, browsable and searchable data type.  The product has been designed using an extensible plugin architecture, which allows for the integration of novel analysis modules, encoding devices, and export options.  The technology was recently used to rake muck by providing a searchable version of the Clinton Testimony video at http://video.altavista.com.
 



Wednesday October 14th
David Heeger
Psychology Dept.
Stanford University
 
Linking Human Brain Activity with Perceptual Performance using fMRI

The main focus of the research in my lab is to use fMRI to quantitatively investigate the relationship between brain and behavior.  I will present results of two studies.

The first study was designed to test the controversial hypothesis that dyslexia involves a deficit in a specific visual pathway, known as the magnocellular (M) pathway, from the eye to the brain.  We found a strong three-way correlation between individual differences in: (1) M pathway brain activity in visual cortex, (2) reading performance, and (3) performance in a motion discrimination task that depends on M pathway integrity.  Subjects with greater fMRI activity were faster readers and better performers in the motion discrimination task.

In the second study, we tested whether brain activity in primary visual cortex (V1) might be modulated by attention.  In the experiments, subjects fixated the center of a display while performing a visual discrimination task on either the right or the left (without moving their eyes).  Stimuli on the right are processed by neurons in the left hemisphere and vice versa.  The results very clearly demonstrate that V1 neural activity can be modulated by attention; activity modulated out of phase in the 2 hemispheres as attention shifted back and forth



Thursday October 29th
Arman Maghbouleh
Stanford University
Automatic Labeling of Emphasis in Speech

Have you ever wondered how it is that the sentence "X is Y" can be spoken to express alternatively:
 - belief in that X is Y,
 - disbelief in that X is Y,
 - a question about whether X is Y, or
 - a correction that contrary to prior belief that Z is Y, X is Y?

Have you ever wondered if some portions of a person's utterance are more dispensable than others?

Have you ever wondered whether English has any counterparts to tones in Chinese?

There exists linguistic theory that relates the above questions to the ways English speakers emphasize syllables in speech. In this talk I will briefly explain that theory; describe ToBI, a related standard for labelling intonation; and present the statistical models I have developed for automatically recognizing ToBI labels. The models use duration values of vowels to detect emphasized syllables. They then use parameters extracted from schematized fundamental frequency contours to recognize particular kinds of emphasis.
 


Wednesday November 4th
Alan Yuille
Smith-Kettlelwell Eye Institute

Visual Search: Fundamental Bounds, Order Parameters, Phase Transitions, and Convergence Rates

A. L. Yuille and James M. Coughlan, Smith-Kettlelwell Eye Institute

We consider the task of detecting a target in a cluttered
background. The problem is formulated in general but we specialize to
the specific tasks of tracking a contour -- such as a road in an
aerial image. We determine order parameters, which depend on
statistical properties of the target and domain, that characterize the
difficulty of the task independently of the algorithm employed to
detect the target. For the road tracking problem, we show that there
is a phase transition at a critical value of the order parameter --
above this phase transition it is impossible to detect the target by
any algorithm.  Next we develop a theory of convergence for visual
search algorithms. This determines closely related order parameters
which determine the complexity of search and the accuracy of the
solution. Using the $A^{\ast}$ search strategy, we prove that the
expected number of nodes explore by the algorithm is linear in the
size of the target. Moreover, the expected amount of sorting time is
constant for each node.}


Monday November 9th, 2pm
Deb Roy
MIT Media Lab
Grounded Language Learning

I will present a computational model of early language learning which
acquires a lexicon of words grounded in auditory and visual sensory
input. The central question it attempts to address is: How can a
learner acquire words from natural linguistic and non-linguistic
sensory input without prior knowledge of the units of the language,
the syntax of the language, or pre-existing semantic categories? The
model proposes a multidirectional inference process in which acoustic
and co-occurring visual observations serve as mutual constraints in
learning to segment speech, form visual categories, and establish
associations between acoustic forms and visual semantics. I will
present the current status of this effort including a working
implementation of the system which has been taught words corresponding
to colors and shapes of objects. The only input to the system consists
of connected multiword utterances recorded by a noise canceling
microphone, and corresponding images of objects taken from multiple
perspectives with a CCD camera mounted on a robotic platform. I will
also discuss applications of this work in building adaptive spoken
language interfaces.
 



Wednesday November 18th
Carlo Tomasi and Tong Zhang
Stanford University
Fast, Robust, and Consistent Camera Motion Estimation

Previous algorithms that recover the velocity of translation and rotation
of a camera from image velocities suffer from a combination of bias and
excessive variance in the results. We propose an estimator of camera motion
that uses robust norms and is statistically consistent when image noise is
isotropic. Consistency means that the estimated motion converges in probability
to the true value as the number of image points is increased. We develop an
algorithm based on reweighted Gauss-Newton iterations that computes motion
from 100 velocity measurements in about 50 milliseconds on a Sparc workstation.



Monday November 23th, 2pm
Yair Weiss
UC Berkeley
Human motion perception --- a Bayesian theory

The wealth of experimental data on human motion perception seems to suggest the existence of many specialized mechanisms, each of which is used for a small set of stimuli. Here, we present an alternative approach. We formulate a small number of assumptions and use the tools of Bayesian inference to find the most probable motion for a given scene.  In reviewing a large number of previously published phenomena we find that the Bayesian estimator predicts a wide range of psychophysical results.  This suggests that the seemingly complex set of illusions arise from a single computational strategy that is optimal under reasonable assumptions.

Joint work with Ted Adelson


Wednesday Decemeber 9th
David Forsyth
UC Berkeley
 
FINDING PEOPLE AND ANIMALS IN LARGE COLLECTIONS OF IMAGES

Segmentation and recognition tend to be seen as separate activities in
theories of human and of machine vision. Building programs that can
handle interesting applications of object recognition --- for example,
recovering pictures that contain particular objects from poorly
structured collections --- requires object representations that can
represent objects at a reasonable level of abstraction and can help
segment objects from the image. Ideally, we should be able to learn
these representations from images. I have built a number of programs
that illustrate the issues that need to be dealt with. One can tell,
quite accurately, whether an image contains a naked or scantily-clad
person; another can tell whether an image contains a horse or
not. These programs have been tested on large collections of pictures
of diverse content. The representations can be learned from image
data.






Sigcomp talks are organized by Trevor Darrell. Email me with comments or updates. The Sigcomp talk series was originated by Michele Covell.