We'll use notations similar to batch learning draft.
Let denote a sentence. We represent each word
and it's context as a feature vector
of dimension
(this is for you to specify). This feature vector should generalize to unseen words.
Each annotated sentence w has an associated set of arcs , and we denote this set
. Let
be the annotated corpus. A function that maps an arc to a value is given by
where ,
are features depending on the distance between word
and word
. Let
be the set of feasible parse trees for sentence
.
is a collection of arcs that forms a parse tree which associates a score
Let . Assume we are given
sentences and their Dependence parsing
, we try to solve following optimization problem,
where is a convex and differentiable1) loss function. We are using a ``hard'' regularization constraint instead of regularization terms in the objective, and later we'll apply online SDP algorithm (Jaggi & Sulovsky, 2010) (Hazan, 2008).
The regularization constraint of can be re-written using block matrix
,
(Jaggi & Sulovsky, 2010) showed the above constraint is equivalent to the constraint using positive semi-definite matrix,
and therefore the original optimization problem can be cast as a semi-definite programming (SDP) problem of the following form,
where is the objection function.
For the semi-definite optimization problem shown above, (Hanza, 2008) gives an gradient descent type algorithm with convergence rate. Another advantage of this algorithm is that the parameter matrix is maintained using a list of rank-1 component, and the rank of the matrix is at most
after
iterations.
where is a sub-routine used to calculate the
-approximate largest singular value of matrix
and
is a constant called modulus of convexity (or curvature constant) of function
,
and in this application and algorithm.