Interval Research Signal Computation Talk Series, Spring 1999





 
Sigcomp is currently on hiatus, and will return in the fall.



All talks are at 11am at Interval Research in the C104/"PageMill" conference room unless otherwise noted. Driving directions to Interval are available at http://www.interval.com/frameset.cgi?about/come/index.html.. The talk series website is http://www.interval.com/~trevor/sigcomp-spring99.html.

Please email trevor@interval.com if you plan to attend so we can have a visitors badge prepared for you when you arrive.

To receive Sigcomp talk announcements via email, you can subscribe to our mailing list by sending an email message to majordomo@interval.com with "subscribe signals-talk" in the body of the message.




Abstracts:
 

Wednesday, January 6th, 1999
J. Thomas Ngo and Neal A. Bhadkamkar
Interval Research Corporation

Adaptive Blind Separation of Audio Sources by a Physically Compact Device Using Second-Order Statistics

We describe an adaptive approach to blind source separation designed for use with a compact (1-cm) microphone array in a reverberant environment.  A first stage attempts separation assuming point sources in an echo-free environment.  A second stage reduces any remaining crosstalk due to reverberation.

The design of the first stage is of particular interest for two reasons.  First, in contrast with many adaptive techniques, it requires no gradient information.  It searches in the space of possible time-delay combinations by maintaining efficiently a population of hypotheses.

Second, it operates without explicitly computing higher-order statistics, which many researchers have believed to be necessary for separation.  Instead, in the spirit of AMUSE \cite{tong91:AMUSE}, of Molgedey and Schuster \cite{molg94:time-delayed-correlations}, or of Belouchrani and colleagues \cite{belo97:joint-diagonalization}, it enforces decorrelation at multiple delays to disambiguate the unmixing functions.




 
Wednesday, February 3rd
Michael J. Black
Xerox Palo Alto Research Center

Explaining Optical Flow Events with Parameterized Spatio-temporal Models

We propose a spatio-temporal representation for complex optical
flow events that generalizes traditional parameterized motion models
(e.g. affine) yet differs in significant ways.  First, the spatio-
temporal models may be non-linear or stochastic.  Second, these models
are event-specific in that they characterize a particular type of
object motion (e.g. sitting or walking).  The computational problem
involves choosing the appropriate model, phase, rate, spatial
position, and scale to account for the image variation.  The posterior
distribution over this parameter space conditioned on image
measurements is typically non-Gaussian.  The distribution is
represented using factored sampling and is predicted and updated over
time using the Condensation algorithm.  The resulting framework
automatically detects, localizes, and recognizes motion events.



 
Wednesday, February 17th
Stan Birchfield
Stanford University

Depth and Motion Discontinuities

Depth and motion discontinuities arise wherever a light ray incident
on a camera sensor meets a discrete change in the depth or motion of
the surfaces in the world.  Because these discontinuities are directly
tied to the structure of the scene, they provide key information for
automatic image understanding.  For example, they tend to coincide
with occlusions and with the boundaries of objects, making them useful
for applications such as camera control, compression, and tracking.
And because they have simple and precise definitions, all the
subjective issues present in many computer vision problems are
avoided.

In the first part of this talk I present an algorithm for detecting
depth discontinuities from a stereo pair of images.  The algorithm
is fast and is shown to produce good results on difficult images
containing untextured, slanted surfaces.  Also discussed in detail
is a new measure of pixel dissimilarity, used by the algorithm,
that is insensitive to image sampling.

Then some work aimed at detecting motion discontinuities from a
monocular image sequence is described, along with a discussion of
why this is a harder problem.  Also outlined are potential
improvements and remaining limitations of some recent maximum-flow
techniques that appear to be promising extensions for both the
stereo and motion work.

Finally, I describe an algorithm that uses discontinuities to track
people's heads.  The system is able to automatically control a
camera's pan, tilt, and zoom in order to keep a person's head centered
in the field of view at a desired size as the person moves around a
room.  Videos will be shown demonstrating the algorithm's ability to
handle situations impossible with previous methods, such as
full out-of-plane rotation with a dynamic background.



 
Wednesday, February 24th
Tanveer Syeda-Mahmood
IBM Almaden Research Center

Directions in Image Database Indexing

The fundamental issues in the design of multimedia databases revolve
around fast and robust automatic content indexing, i.e., the selection of multimedia data
containing an answer to a content query. The performance of  content-based access is affected by
issues that impact the design of an entire multimedia database system such as multimedia feature
extraction, data representation, organization, query formulation and search. In this talk I
will give an overview of our work addressing some of these issues in the context of
image databases. In particular, I will present a computational framework for the design of
image databases using the paradigm of visual scene recall, and give an overview of our work on
attentional representations for databases, query formulation and indexing for color
surfaces, flexible shape models for database organization, localization of object queries,
content-based selection of databases, and the more recent work on cross-modal indexing of
multimedia data.




 
Wednesday March 17th, 4pm
** Note special time **
Matt Brand
MERL

Puppetry by manifolds

Manifold puppetry is a way of capturing the motor skills of highly
expressive people and making them available to the rest of us, at least
in animation.  We learn the manifold of a motor skill, then learn how to
map the cues of a puppeteer to the most probable trajectories over that
manifold.  Usually the mapping from puppeteer cues to puppet actions is
many-to-many; manifold puppetry makes optimal use of context to
disambiguate a sequence of cues and generate a natural looking sequence
of actions.  I'll demonstrate this approach with the Voice Puppet -- a
voice-driven "face-syncing" system that generates expressive facial
animation from neck to hairline, including lip-syncing.  It
automatically models full facial dynamics and co-articulation phenomena,
can handle non-speech sounds, and has been used to animate a wide
variety of faces ranging from baby photos to 3D models of barn animals
to Mount Rushmore.



 
Wednesday, April 7th

 David G. Stork
 Chief Scientist, Ricoh Silicon Valley
 Consulting Associate Professor of Electrical Engineering,  Stanford University

The Open Mind Initiative

We propose the Open Mind Initiative, to provide a framework
for large-scale collaborative efforts in building components of
"intelligent" systems that address common-sense reasoning,
document and language understanding, speech and character
recognition, and so on.  Based on the Open Source methodology, the
Open Mind Intitiative allows domain specialists to contribute
algorithms, tool developers to provide software infrastructure and
tools, and non-specialists to contribute information to large
knowledge databases.  An important challenge is to make it easy
and rewarding for non-specialists to provide information.  We
review free software and open source approaches, including their
business and economic issues models, and past software projects of
particular relevance to Open Mind.  We then describe some of the
technical details associated with Open Mind projects, such
as insuring data integrity and learning from heterogeneous
contributors and conclude with general challenges and
opportunities.[1,2]

[1] "Character and Document Research in the Open Mind Initiative"
by David G. Stork, International Conference on Document Analysis
and Recognition (ICDAR99), 1999, in press.

[2] "The Open Mind Initiative" by David G. Stork, Communications
of the ACM (submitted) 1999.
 



 
Wednesday, April 21st

Tony Verma
Stanford University

A Perceptually Based Audio Signal Model with Application to
Scalable Compression

Audio delivery in network environments such as the Internet where
bandwidth is not guaranteed, packet loss is common and where users
connect to the network at various data rates demands scalable
compression techniques. Scalability allows each user to receive the
best possible audio quality given the current network condition.  In
addition, because the separation principle for source and channel
coding does not apply to lossy packet networks, an audio source coding
technique that explicitly considers channel characteristics is
desirable. These goals can be achieved by using a higher level
description for audio than the actual waveform. This talk will focus
on a method for extracting meaningful parameters from general digital
audio signals that takes into account the way humans perceive sound;
moreover, application of this parametric model to scalable audio
compression will be discussed.

The model consists of three major components: sines, transients and
noise. These underlying signal components are found during the
analysis stage of the model.  Quantizing and compressing the resulting
model parameters allows for efficient storage and transmission of the
original audio signal.  The talk will cover enhancements made to
current sine models. These enhancements allow explicit perceptual
information to be included in the sinusoidal model.  In addition, a
novel transient modeling technique will be covered.

The three part model provides an efficient, flexible and perceptually
accurate representation for audio signals.  It therefore is
appropriate for scalable compression over lossy packet networks.  The
efficiency of the model ensures high compression ratios.  Flexibility
simultaneously allows scalability and robustness to channel
characteristics such as packet losses because subsets of model
parameters represent the original signal with varying degrees of
fidelity.  Perceptual accuracy ensures that parameter subsets
reasonably represent the original signal while the complete parameter
set represents the original exactly in a perceptual sense.

Most current techniques for audio compression (e.g., MPEG audio layer
3 and AAC, Real Audio's G2, etc.), use a subband decomposition in
conjunction with psychoacoustic models to compress the actual audio
waveform itself.  No model of the signal is assumed.  These
compression techniques have been very successful for targeted fixed
bit rates; however, they cannot be scaled in large steps without
severe loss in quality.  This is evident in the case of Real Audio
where a database will store many versions of an audio signal at
various bitrates (e.g., 92Kbps, 64Kbps, 32Kbps, 20Kbps, and 16Kbps)
and quality.  Because using an underlying model allows meaningful
subsets of parameters to describe the original signal, one compressed
bitstream (e.g., 96Kbps) can be stored.  Embedded within this bitstream
are lower bitrate versions (e.g., 64Kbps, 32Kbps, 20Kbps, and 16Kbps)
that can be easily extracted.  Sound demos of the audio compression
scheme will be played.

Host: M. Slaney
 
 



 
Thursday May 6th, 10am -- *note special day and time*

P. Anandan
Microsoft Research
Redmond, WA

From 2D images to 2.5D sprites:  A layered approach to visual scene modeling

A central problem in Vision is the decomposition of the visual information contained in or more images of a scene into coherent elements. A recent trend in Computer Vision is the decomposition of a scene into a set of image layers based on coherence of 2D image motion.  The layered representation corresponds to the well-established method in Graphics and Animation of compositing images from a collection of images (or ?sprites?) that are layered from back to front.  However, except for the depth ordering naturally maintained in the layered decomposition, there has been little or no effort to model the 3D structure of the scene.

In this talk, I will describe a new approach for modeling the appearance and geometry of 3D scenes as a collection of 2.5D sprites.  Each sprite corresponds to a view-based representation of a portion of the scene whose disparity across a set of views can be approximately modeled as a planar surface at an arbitrary orientation.  Each sprite is described by the parameters of the plane, a color image that specifies the appearance of that portion of the scene, a per-pixel opacity map, and a per-pixel depth offset relative to the nominal plane. New views of the scene can be generated efficiently by rendering each individual layer from that view and combining the layer images in a back to front order.  Layers from different scenes can be combined into a new synthetic scene with realistic appearance and geometric effects for multimedia authoring applications.

I will describe our current semi-automated approach for layer extraction and show results from real images.  I will also describe a new Bayesian approach for automatic layer extraction and show preliminary results.  I will also discuss the domain of applicability of this approach to 3D scene modeling.

(Anandan will be also giving this talk at Berkeley on May 4th.)


Wednesday May 12th

Luc Julia
SRI International
Computer Human Interaction Center (CHIC!) - http://www.chic.sri.com
SRI International

CHIC! We build interactive systems of tomorrow!

Over the past five years, a team at SRI International has focused on new
kinds of interaction between humans and computers, applying talking, writing
and drawing to extend more standard input devices such as mice and keyboards.

SRI International is the place where the computer mouse and Internet were
first created. Despite this History, we want to change the way people interact
with computers and networked services, we actually want to move toward a
new paradigm, closer to our interactions in everyday life.

In order to get new ideas, evaluate and validate them, CHIC! creates
prototypes and conducts usability studies. Our programs, based on the best
basic research components available as well as commercial products, are
using SRI's Open Agent Architecture (OAA), an infrastructure which is well
suited to rapid prototyping of complex systems that combine heterogeneous
information sources. Intelligent interfaces to the dynamic community of services
allow the combination of several modalities in a synergistic fashion, thus
empowering the user.

During this talk, we'll present as many demonstrations as possible to show
how different technologies were integrated and how information became accessible
to the end user through easy and natural user interfaces. As a conclusion, we will try
to sketch our vision for the systems of the future.
 
 
 





Click here for abstracts from the Fall 1998 Sigcomp Seminar Series





Sigcomp talks are organized by Trevor Darrell. Email me with comments or updates. The Sigcomp talk series was originated by Michele Covell.