\documentclass[10pt]{article}
\usepackage{oxygen2001}
\usepackage{epsf}
\usepackage{oxyapa}
\usepackage{graphicx}
\usepackage{times}
% If you're using the archaic, outmoded, ancient, grungy, deprecated,
% and generally looked-down-upon LaTeX 2.09 version, then you'll want
% something more like this:



\begin{document} 
\oxytitleblock{
\oxytitle{HMM based recognition of freehand sketches}
\oxyauthor{Tevfik Metin Sezgin}{mtsezgin@ai.mit.edu}
\oxyaddress{MIT Artificial Intelligence Laboratory, 200 Technology
Square, Cambridge MA, 02139 USA}
}
%\



%\
\section{The Problem}
We use sketches as a medium for expressing ideas and saving thoughts.
Sketching is especially common in early design as a means of communication,
documentation and as a tool for stimulating thought. Despite the increasing 
availability of pen based PDAs and PCs, we still can't interact with our 
devices via sketching as we do with people. As a group, we are building 
a generic multi-domain sketch recognition architecture to make computers sketch
literate. This sketch recognition
system will differ from existing architectures in many aspects, including a 
language for describing shapes, mechanisms for learning new shapes, and a 
blackboard based recognition architecture with top-down and bottom-up recognizers. 
Here we describe a part of this system that generates efficient bottom-up 
recognizers by compiling object descriptions.


\section{Motivation} 
As described in \cite{davis-abstract-2002}, current 
sketch recognition systems require users to hand-code individual recognizers 
as well as data structures for each object to be recognized. 
Hand-coding individual recognizers has a number of drawbacks: {\em(i)} 
writing recognizers and data structures is labor intensive and error prone,
{\em(ii)} extending or modifying existing recognizers requires 
knowing how they work, {\em(iii)} because recognizers may be written by
different programmers and may have different recognition algorithms, they
lack a unified approach to recognition, {\em(iv)} users usually sketch parts of
objects in a certain order and style, but current systems don't have a systemic
way of exploiting this information to improve recognition accuracy and speed.

\section{Previous Work} 
Current sketch recognition systems generally either have very limited
recognition, sidestep recognition to avoid problems induced by poor
recognition (\cite{gross96}, \cite{landay2000}), or depend heavily on other
modalities such as speech \cite{mcgeePUI2001-2}.

Previous work in our group has focused on sketch recognition 
\cite{alvarado_ijcai}. One drawback of this first approach was that adding new
recognizable objects required writing new recognizers and data structures.
This approach also suffered from the problems of hand coding mentioned above.

\section{Approach} 
We aim to solve the problems mentioned above by automatically generating 
recognizers from object descriptions. Objects are described in an object description
language that includes information about components that form the object and
constraints that must be satisfied in order to have a legal instance of 
an object \cite{hammond-abstract-2002}. Given an object description, our system 
generates the Java code for a recognizer that functions as a
knowledge source in the blackboard based architecture. This automatic code 
generation scheme removes the need to write a separate recognizer for each 
object in the domain.

Our system reads object descriptions with a parser written using javacc (Sun's Java 
compiler compiler) and builds an abstract syntax tree (AST). The AST contains
information about
components, about constraints of different types, and about declarations such as 
renaming statements that provide a mechanism for referring to objects with a 
different name. The AST is processed to generate Java 
code for individual recognizers. The generated code includes a series of
nested for-loops and conditional statements that cycle through the
strokes currently on the sketching surface, looking for a  permutation of the
strokes and that satisfy the constraints. If a
particular subset of the strokes in the sketching surface satisfies all
constraints, a recognition is signalled to the blackboard. If only a
subset of the constraints is satisfied, the blackboard is notified about a
partial recognition.

The recognition algorithm outlined above has worst-case exponential complexity. 
The exponential nature of sketch recognition task is also mentioned in 
\cite{mahoney-2002}. The root cause of this exponential complexity is the 
assumption that the strokes forming an object can be drawn in any order and even 
in an interspersed fashion, requiring that we be able to test all possible orders.
But this is not typical behavior:
As we found out in a user study, people sketch in a fairly predictable
order (e.g., when drawing a stick figure, head is drawn first, left arm/leg is 
drawn before the right arm/leg). We also observed that people typically finish
one object before they start drawing another. These key observations enable 
us to create a polynomial time recognition algorithm by building hidden markov
models (HMMs) to model different sketching styles. Training data for the HMMs
is generated by encoding the output of the early sketch processing toolkit
described in \cite{metin-ms-thesis}. We train a separate HMM for each object
class. We use these models to group strokes forming the same objects 
(segmentation) and find what kind of object they form (recognition) at the same
time. This recognition scheme is more efficient than the brute force
search method described above. Some of the results obtained using this method
can be seen in Fig.~\ref{in_out}.
\begin{figure}[ht]
  \centering{
    \vbox{
        \begin{tabular}{c}
          {\includegraphics[scale=0.3,clip=true]{segmentation_in.eps}} \\
          {\includegraphics[scale=0.3,clip=true]{segmentation_out.eps}} 
        \end{tabular}
         } 
        }
  \caption{A number of hand-sketched objects and the recognition results (on the right). Colors indicate object class. Note that our method can also detect different sketching styles within the same object class.}
  \label{in_out}
\end{figure}

Our system, along with the language described in \cite{hammond-abstract-2002},
helps separate the task of describing an object from how the task of
recognizing it, and brings a unified approach to recognition. We also address
the four problems mentioned before.  We have introduced a systematic way of
exploiting stroke order and drawing styles to produce robust and fast
recognition.

\section{Future Work}
We described two sketch recognition methods. The code generation method exhaustively 
checks all possible stroke orderings and makes no assumptions about drawing order.
On the other hand, the HMM based method utilizes users' sketching styles and runs in 
polynomial time. These two methods are complementary in that the if the user sketches
in a style captured by the HMMs, the HMM based method is ideal. If the user sketches
in a style that the HMMs haven't been trained on before, the generated code can handle
these cases. We are working on smart ways of combining these methods (as opposed to running 
them in separation one after the other), as well as improving the 
speed and accuracy of these methods individually. We are also investigating how 
hierarchical HMMs can be used in recognition and segmentation.
\nocite{}

\bibliography{main}
\bibliographystyle{oxyapa}


\end{document}

