User Tools

Site Tools


project:hillclimb_dp:draft

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

project:hillclimb_dp:draft [2014/03/31 02:14]
taolei [Hill Climbing Algorithm]
project:hillclimb_dp:draft [2014/05/02 14:57] (current)
taolei [Justification of the Algorithm]
Line 41: Line 41:
 <WRAP clear></​WRAP>​ <WRAP clear></​WRAP>​
  
-Note that (1) the algorithm repeatedly checks each word in **//reverse DFS order//** (thus a word that is far away from the root comes first); and (2) when changing the head of m, it has to keep $y^{(t)}$ a valid tree at any point $t$; (3) it operates until no changes are possible to increase the score.+Note that (1) the algorithm initializes $y^{(0)}$ from a distribution $Q$; (2) the algorithm repeatedly checks each word in **//reverse DFS order//** (thus a word that is far away from the root comes first); and (3) when changing the head of m, it has to keep $y^{(t)}$ a valid tree at any point $t$; (4) it operates until no changes are possible to increase the score.
  
 \\ \\
Line 103: Line 103:
 \sum_{i=1}^{N} \mathcal{L}(\hat{x_i},​\hat{y_i}) + \|\theta\|_2^2 \sum_{i=1}^{N} \mathcal{L}(\hat{x_i},​\hat{y_i}) + \|\theta\|_2^2
 $$ $$
-For the traditional structural model (and the corresponding ​condition (1)), the loss function $\mathcal{L}(x_i,​y_i)$ ​would be +If we are to minimize ​the loss that corresponds to condition (2), it would be:
 $$ $$
-\mathcal{L}(\hat{x_i},​\hat{y_i}) = \max_y \left\lbrace S(\hat{x_i},​ y) + \text{err}(y) \right\rbrace ​- S(\hat{x_i}, ​\hat{y_i}+\mathcal{L}(\hat{x_i},​\hat{y_i}) = \max_{y,​y'​} ​\left\lbrace S(\hat{x_i},​ y') + 1 - S(\hat{x_i}y) \right\rbrace \quad\quad\quad (2o)
 $$ $$
-Evaluating ​this loss function ​(and its derivative) requires solving the same type of maximization problem. +Unfortunately,​ it is not clear if there is an efficient way to compute ​this loss function ​by finding $y$ and $y'$Instead, ​we could use the standard structural learning ​loss function, which corresponds to condition (1):  
- +
-If we are to minimize a similar ​loss for condition (2), it would be:+
 $$ $$
-\mathcal{L}(\hat{x_i},​\hat{y_i}) = \max_{y,y'} \left\lbrace S(\hat{x_i},​ y') + - S(\hat{x_i}, ​y) \right\rbrace+\mathcal{L}(\hat{x_i},​\hat{y_i}) = \max_{\tilde{y}}\left\lbrace S(\hat{x_i}, ​\tilde{y}) + \text{err}(\tilde{y})\right\rbrace ​- S(\hat{x_i}, ​\hat{y_i}) \quad\quad (1o)
 $$ $$
 +where $\tilde{y}$ is the output of the decoding algorithm.
 +
 +The connections between the two loss functions are the following: condition (1) is a necessary condition of condition (2), and (1o) is the lower bound of sum of loss (2o) for each two consecutive $(y, y')$ on any "​climbing"​ path from $\tilde{y}$ to $\hat{y}$. Thus, by minimizing (1o) we are making the lower bound of (2o) close to zero. If this lower bound is **//​tight//​** (because $err(\tilde{y})$ is small and the path from $\tilde{y}$ to $\hat{y}$ is short),
 +
 + 
  
project/hillclimb_dp/draft.1396246460.txt.gz · Last modified: 2014/03/31 02:14 by taolei