We'll use notations similar to batch learning draft.
Let denote a sentence. We represent each word and it's context as a feature vector of dimension (this is for you to specify). This feature vector should generalize to unseen words.
Each annotated sentence w has an associated set of arcs , and we denote this set . Let be the annotated corpus. A function that maps an arc to a value is given by
where , are features depending on the distance between word and word . Let be the set of feasible parse trees for sentence . is a collection of arcs that forms a parse tree which associates a score
Let . Assume we are given sentences and their Dependence parsing , we try to solve following optimization problem,
where is a convex and differentiable1) loss function. We are using a ``hard'' regularization constraint instead of regularization terms in the objective, and later we'll apply online SDP algorithm (Jaggi & Sulovsky, 2010) (Hazan, 2008).
The regularization constraint of can be re-written using block matrix ,
(Jaggi & Sulovsky, 2010) showed the above constraint is equivalent to the constraint using positive semi-definite matrix,
and therefore the original optimization problem can be cast as a semi-definite programming (SDP) problem of the following form,
where is the objection function.
For the semi-definite optimization problem shown above, (Hanza, 2008) gives an gradient descent type algorithm with convergence rate. Another advantage of this algorithm is that the parameter matrix is maintained using a list of rank-1 component, and the rank of the matrix is at most after iterations.
where is a sub-routine used to calculate the -approximate largest singular value of matrix and is a constant called modulus of convexity (or curvature constant) of function ,
and in this application and algorithm.