distinctive touch

mas.622j final project
december 7th, 2004
by electronic max

presentation slides: [ppt] [pdf]


The project surrounded building a system called distinctive touch for enabling lightweight authentication using gestures on touchscreen displays. Using this system, users were identified by their passdoodle, a gestural equivalent of a username and password, which consisted of one or more strokes that they drew on the screen with their finger within a short duration of time. The first time they used a kiosk, users each chose their own unique passdoodles, and trained the system by demonstrating this doodle sseveral times. Their passdoodle then uniquely identified them on subsequent visits to the kiosk. This could be used for kiosk applications to provide users access to personalized content and services, such as content filtering, or messaging. Unlike biometric or RFID-based schemes, distincitve touch protected user privacy by being entirely anonymous and user-initiated.


distinctive touch versus handwriting recognition

The basic goal of the system bore many similarities to the traditional handwriting recognition problem; to reliably classify a handwritten figure. However, there were a number of important differences:

rubine: specifying gestures by example

This project extending prior work done by Dean Rubine, described in his SIGGRAPH '93 paper "Specifying Gesture by Example", in which he outlined a system for learning unistroke gestures called GRANDMA. GRANDMA consisted of a simple linear classifier which used 13 feature extractors that measured geometric and timing properties of each gesture.

Distinctive touch extended this work in the following ways:

project outline

This writeup is broken up into the following parts:



The first classifier that was chosen was a simple k-nearest-neighbors classifier with euclidean distance metric. Recognition results (leave-one-out cross validation versus test performance) for k=1:20 using various different dt feature extractors individually are shown in the following figure:

Recognition results using a combination of all 10 dt (13 rubine) feature extractors are shown in the following figure:

Maximum performance for all features combined classifier: LOO-CV: 52.0%, Test: 48.00%.

Here, we witness the phenomenon that adding features, in the case of knn, can reduce the overall performance. The change in performance was to be expected, since the addition of the extra features increases the dimensionality of the space dramatically, which causes distances between previously close neighbors to be much larger.

Maximum cross-validation performance was obtained with the combination of extractors f1fv, bboxfv2,f13fv,f9fv,distfv,startendfv, at k=1. Performance with these parameters were as follows: LOO-CV: 97.11%, Test: 88.44%.

fisher linear discriminant

The chosen implementation for FLD was a standard 1-dimensional binary discriminant that maximized the separability of 2 classes by projecting N-dimensional points onto a single optimal vector. Each test point projected on this vector was classified according to the optimal bayes decision boundary between the projected means of the two classes. Since FLDs are binary discriminants, a One-Versus-All approach was taken to achive multiclass classification. This involved finding C=45 different FLDs (one for each class). For each of these FLDs, all the training data that belonged to that class were assigned to one class, while all the other training data were lumped into another class. Performance was evaluated by taking the ratio between correct labellings with total evaluations. As will be described in rejection this measure of performance is somewhat misleading, and will be avoided in the future.


fisher linear discriminant performance w / single feature extractors
feature extractorleave-one-out performancetest performance

Furthermore, the two highest-performing test-validation scores for a combination of 3 and 5 feature extractors with FLD are displayed in the following tables, respectively.

fisher linear discriminant performance w / 3 feature extractors
feature extractorleave-one-out performancetest performance
bboxfv2, f12fv, startendfv0.98220.8489
f1fv, f9fv, startendfv0.95330.9733

fisher linear discriminant performance w / 5 feature extractors
feature extractorleave-one-out performancetest performance
f1fv, f12fv, distfv, startendfv, bboxfv20.98670.9156
f1fv, f9fv, f11fv, distfv, startendfv,0.97780.9644

sequential-single-gaussian ML model

This approach involved making a model for each class using the labeled training data, where each model consisted of a sequence of states each corresponding to a single stroke in the gesture. Each state had an associated parametric probability distribution. The parameters for a model were then estimated by maximum-likelihood (ML) on a per-stroke basis using the training data for each class.

The reason for choosing this over an HMM is the following: given the initial assumption that the stroke order would have to be preserved for a gesture to be considered part of a single class, if states represnted entire strokes themselves, the state sequence for a class would be completely known a-priori.

For simplicity, I chose a single gaussian, which made estimating the gaussian parameters for each stroke trivially computed as the mean and covariance of the data in that class. Then, computing the probability that an example belongs to a particular model involved merely taking the product of the probabilities that a stroke sequence resulted from a particular model using the estimated parameters at each state in each model. If a model had different number of strokes than were observed, that model is automatically rejected.


While the stength of this model lies in the simplicity with which estimation can be performed, it is not without limitations. One difficulty arises in the strictness with which the model assumes that all training examples have exactly the same number of strokes and in the right order. Spurious strokes will completely throw classification and training off. Second, although it is simple to compute the likelihoods, it is unclear, given a set of likelihoods, how the algorithm should choose determine which class it belongs to. Choosing the maximum posterior likelihood (as a linear machine) would mean that the system acts "greedily" and never rejects a passdoodle. Depending on the application and installation and preferred failure mode, this property may or may not be desirable. For interactive touchscreen displays in small (trusted) workgroups, such as for A/V systems in workgroup conference rooms, for example, choosing the best match might be the right thing to do. On the other hand, if information or privileges being protected were sensitive, it would be more appropriate to have a scheme that rejects passdoodles below a certain threshhold likelihood. This is discussed further in rejection.


sequential single gaussian ML performance w / single feature extractors
feature extractorleave-one-out performancetest performance

The following table displays the leave-one-out and test performance of the best performing combinations 3, 4 feature extractors out of 10, compared with the performance of all 10 combined.

best-performing sequential single gaussian ML performance for multiple extractors
feature extractorleave-one-out performancetest performance
f9fv, bboxfv2, startendfv0.97330.9156
f1fv, startendfv, f9fv, distfv0.96440.8889
f1fv, f9fv, f11fv, distfv, startendfv,0.97780.9644
all combined: distfv, bboxfv2, f4fv,
startendfv, f1fv, f9fv,
f10fv, f11fv, f12fv, f13fv

support vector machines

For evaluating the performance of support vector machines, I used the OSU SVM Toolkit for Matlab by Ma, Zhao and Ahit. Just as in the fisher linear discriminant case, since SVMs with linear kernels are not designed for multiclass classification, we took a One-Versus-All approach. This involved training a separate SVM for each class, assigning the data that belonged to that class a label of "1", while all data was lumped in class "2". When all the SVMs were trained, each SVM was then, in turn, evaluated on all test data. The score was calculated by the sum of the correctly assigned lables, divided by C*N (the number of classes * size of the test set.)

In practice, training SVMs over and over for leave-one-out validation took far too long. Therefore, I have omitted reporting loo-validation results here, and merely report test results.
best-performing support vector machine classifier
feature extractorlinear kernel test performancequadratic kernel test performance
all combined: distfv, bboxfv2, f4fv,
startendfv, f1fv, f9fv,
f10fv, f11fv, f12fv, f13fv

which method is best?

Each classifier has advantages and disadvantages. The k-nearest-neighbor classifier was the simplest to implement, but unfortunately also proved the most sensitive to choice of features, and thus would be less preferered. Sequential ML modeling strictly encoded the stroke order, and number of strokes, and thus was perhaps too rigid in practice. The 1-dimensional fisher linear discriminants performed (surprisingly) well for its simplicity. However, it was in general outperformed by the more elaborate SVM. This might be due to the max-margin criterion being more stringent (a more harsh regularizer) than the separability criterion of the FLD.


One of the original criteria that were overlooked when evaluating the classifiers above was how well it could be made to reject gestures that did not belong to any of the classes. In the versions of the algorithms above, both k-nearest neighbors and sequential ML models always greedily choose the closest class. This, unfortunately, means that these algorithms will never reject a passdoodle -- hardly a desirable property for an authentication algorithm to have!

In order to modify these algorithms to be able to reject gestures, they will each have to be given thresholds which represent the "maximum dissimilarity with any class" to tolerate. Then, these algorithms can be made to reject any gestures that are more dissimilar than this quantity. In k-nearest-neighbors, this threshold would be a maximal "distance" value, whereas with sequential ML it would be represented as a minimum log probability. In practice, however, coming up with such threshold values is tricky. In particular, since the variance of data for each class will differ, this maximum dissimilarity may depend on the variance of the data in the class. It may be feasible to use the Mahalanoubis distance, however, I have not tested how well this performs.

The FLD and SVM classifiers, however, may reject gestures as-is; however, it becomes difficult to gain a sense during the training process of exactly how liberal these classifiers will be even among the training set. In order get a slightly better sense, the SVM classifiers were re-evaluated using different performance criteria than was used previously. In this scheme, a classification is considered correct only if all C=45 classifiers (SVMs) output the proper labels. That is, a score of 1 is assigned for every trial gesture in a test set, if exactly 1 SVM classifier corresponding to the actual class of the gesture accepts it, and all the remaining classifiers reject it. The performance with this new criteria are outlined below.

sequential single gaussian ML performance w / single feature extractors
feature extractorlinear kernelquadratic kernel cubic kernelquartic kernel
all combined: distfv, bboxfv2, f4fv,
startendfv, f1fv, f9fv,
f10fv, f11fv, f12fv, f13fv
0.69780.6222no datano data
f1fv, f9fv, bboxfv2, distfv, startendfv 0.72440.84440.83110.7867
f1fv, f9fv, f11fv, bboxfv2, distfv0.64890.84000.80000.7511

An alternative might be to use thesholds on log likelihoods from the SVMs, or distances from the margin in the FLD to reject gestures.

[ back to top ]