This shows you the differences between two versions of the page.
project:hillclimb_dp:draft [2014/04/02 16:55] taolei [Justification of the Algorithm] |
project:hillclimb_dp:draft [2014/05/02 14:57] (current) taolei [Justification of the Algorithm] |
||
---|---|---|---|
Line 109: | Line 109: | ||
Unfortunately, it is not clear if there is an efficient way to compute this loss function by finding $y$ and $y'$. Instead, we could use the standard structural learning loss function, which corresponds to condition (1): | Unfortunately, it is not clear if there is an efficient way to compute this loss function by finding $y$ and $y'$. Instead, we could use the standard structural learning loss function, which corresponds to condition (1): | ||
$$ | $$ | ||
- | \mathcal{L}(\hat{x_i},\hat{y_i}) = S(\hat{x_i}, \tilde{y}) + \text{err}(\tilde{y}) - S(\hat{x_i}, \hat{y_i}) \quad\quad\quad\quad (1o) | + | \mathcal{L}(\hat{x_i},\hat{y_i}) = \max_{\tilde{y}}\left\lbrace S(\hat{x_i}, \tilde{y}) + \text{err}(\tilde{y})\right\rbrace - S(\hat{x_i}, \hat{y_i}) \quad\quad (1o) |
$$ | $$ | ||
where $\tilde{y}$ is the output of the decoding algorithm. | where $\tilde{y}$ is the output of the decoding algorithm. | ||
- | The connections between the two loss functions are the following: condition (1) is a necessary condition of condition (2), and (1o) is the lower bound of sum of loss (2o) for each two consecutive $(y, y')$ on any "climbing" path from $\tilde{y}$ to $\hat{y}$. | + | The connections between the two loss functions are the following: condition (1) is a necessary condition of condition (2), and (1o) is the lower bound of sum of loss (2o) for each two consecutive $(y, y')$ on any "climbing" path from $\tilde{y}$ to $\hat{y}$. Thus, by minimizing (1o) we are making the lower bound of (2o) close to zero. If this lower bound is **//tight//** (because $err(\tilde{y})$ is small and the path from $\tilde{y}$ to $\hat{y}$ is short), |