# Iterative Reweighted Least Squares

I was shown this technique by Anat Levin, who pointed me to the paper
"User assisted separation of reflections from a single image using a
sparsity prior" by Anat Levin and Yair Weiss as an example of how the
method is used.  I subsequently "reverse-engineered" the method a bit
to get an idea for what it does.

IRLS minimizes the cost function:

     %% C(x) = \sum_i f_i((Ax)_i - b_i) = \sum f(Ax-b)

where my notation is:

* lowercase variables are vectors by default
* `%f(u) = \left[ f_1(u_1), f_2(u_2), \cdots \right]^T%`
* the functions `%\{f_i\}%` are scalar potential functions
  that are smallest at 0
* an unindexed sum is performed over the elements of its
  vector argument

The intuitive explanation of the method is that each potential
function is approximated at each iteration by a parabola through the
origin that has the correct slope at the current value of `%x%` being
considered.

## Derivation

Taking the derivative of the cost function:

     %% C'(x) = A^T f'(Ax-b)

where the derivative of `%f%` is taken element-wise
(`%(f'(u))_i = f'_i(u_i)%`).

If all the potentials were parabolas we could do least-squares.  In
this case, all the `%\{f'_i\}%` would be linear.  Taking into account
that `%f'_i(0)=0%`, we can approximate the functions `%\{f'_i\}%` as
linear based on the current value of `%x%` (denoted `%x_0%`), as
follows:

     %% C'(x) \approx A^T \left[ (Ax-b) \ast \frac{f'(Ax_0-b)}{Ax_0-b}\right]

where `%\ast%` denotes element-wise multiplication and the fraction
bar denotes element-wise division.  Defining a diagonal matrix `%D%`
for the weights, we want the derivative of the cost function to
vanish:

     %% C'(x) \approx A^T D (Ax-b) = 0

     %% \Rightarrow \quad
     %% A^T D A x = A^T D b, \quad
     %% D = \frac{f'(Ax_0-b)}{Ax_0-b}

This is the least-squares problem we solve at each iteration to get the
new value `%x%` from the old value `%x_0%`.

## Example: Reconstruction with a sparse wavelet prior

Suppose we wish to minimize the following cost function for `%x%`:

     %% C(x) = \frac{1}{2}\sum(y-Tx)^2 + \lambda \sum \lvert Wx \rvert^\alpha

where we wish to recover an image `%x%` that has been transformed by a
known transformation `%T%` to yield a noisy observation `%y%`,
`%\lambda%` is some positive scalar weight, `%W%` takes `%x%` into a wavelet
basis, and the absolute value and exponentiation are performed
element-wise.

We define `%A%`, `%b%`, and `%f%` for this problem:

     %% A = \left[ \begin{array}{c} T \\ W \end{array} \right], \quad
     %% b = \left[ \begin{array}{c} y \\ 0 \end{array} \right]

     %% f(Ax-b) = f\left(\left[\begin{array}{c} Tx-y \\ Wx \end{array}
     %% \right] \right) =
     %% \left[ \begin{array}{c} (Tx-y)^2/2 \\ \lambda\lvert Wx \rvert^\alpha
     %% \end{array} \right]

and calculate the weights:

     %% \frac{f'(Ax_0-b)}{Ax_0-b} = \frac{\left[ \begin{array}{c} Tx_0-y
     %% \\ \lambda\alpha(\text{sign}x_0)\lvert Wx_0 \rvert^{\alpha-1}
     %% \end{array} \right]}{\left[\begin{array}{c} Tx_0-y \\ Wx_0 \end{array}
     %% \right]} = \left[ \begin{array}{c} \mathbf{1} \\
     %% \lambda\alpha\lvert Wx_0 \rvert^{\alpha-2} \end{array} \right]

where `%\mathbf{1}%` is a vector consisting of the appropriate number
of ones.  Writing out the diagonal matrix for the weights:

     %% D = \left[\begin{array}{cc} I & 0 \\ 0 & \Psi
     %% \end{array}\right], \quad
     %% \Psi =
     %% \text{diag}\left(\lambda\alpha\lvert Wx_0\rvert^{\alpha-2}\right)

Remember that at each iteration we solve:

     %% A^T D A x = A^T D b

where `%D%` incorporates information from the previous iteration.  In
this example, the equation becomes:

     %% (T^T T + W^T \Psi W)x = T^T y

This works well when implemented.

Some tips:

* Don't raise small or zero numbers to negative powers, set a lower
  bound (e.g. 1e-4)
* If your prior is nonlinear, pay attention to the units of `%x%`
* Convergence happens after a small number of iterations, typically
  fewer than 10
* For image reconstruction with a convex prior (e.g. L1), the result
  is quite close to the exact global optimum and looks at least as
  good visually
* Use sparse matrices when possible, of course

-----

[Back](index.html)

[View texdown source](itweightls.text)