# Iterative Reweighted Least Squares I was shown this technique by Anat Levin, who pointed me to the paper "User assisted separation of reflections from a single image using a sparsity prior" by Anat Levin and Yair Weiss as an example of how the method is used. I subsequently "reverse-engineered" the method a bit to get an idea for what it does. IRLS minimizes the cost function: %% C(x) = \sum_i f_i((Ax)_i - b_i) = \sum f(Ax-b) where my notation is: * lowercase variables are vectors by default * `%f(u) = \left[ f_1(u_1), f_2(u_2), \cdots \right]^T%` * the functions `%\{f_i\}%` are scalar potential functions that are smallest at 0 * an unindexed sum is performed over the elements of its vector argument The intuitive explanation of the method is that each potential function is approximated at each iteration by a parabola through the origin that has the correct slope at the current value of `%x%` being considered. ## Derivation Taking the derivative of the cost function: %% C'(x) = A^T f'(Ax-b) where the derivative of `%f%` is taken element-wise (`%(f'(u))_i = f'_i(u_i)%`). If all the potentials were parabolas we could do least-squares. In this case, all the `%\{f'_i\}%` would be linear. Taking into account that `%f'_i(0)=0%`, we can approximate the functions `%\{f'_i\}%` as linear based on the current value of `%x%` (denoted `%x_0%`), as follows: %% C'(x) \approx A^T \left[ (Ax-b) \ast \frac{f'(Ax_0-b)}{Ax_0-b}\right] where `%\ast%` denotes element-wise multiplication and the fraction bar denotes element-wise division. Defining a diagonal matrix `%D%` for the weights, we want the derivative of the cost function to vanish: %% C'(x) \approx A^T D (Ax-b) = 0 %% \Rightarrow \quad %% A^T D A x = A^T D b, \quad %% D = \frac{f'(Ax_0-b)}{Ax_0-b} This is the least-squares problem we solve at each iteration to get the new value `%x%` from the old value `%x_0%`. ## Example: Reconstruction with a sparse wavelet prior Suppose we wish to minimize the following cost function for `%x%`: %% C(x) = \frac{1}{2}\sum(y-Tx)^2 + \lambda \sum \lvert Wx \rvert^\alpha where we wish to recover an image `%x%` that has been transformed by a known transformation `%T%` to yield a noisy observation `%y%`, `%\lambda%` is some positive scalar weight, `%W%` takes `%x%` into a wavelet basis, and the absolute value and exponentiation are performed element-wise. We define `%A%`, `%b%`, and `%f%` for this problem: %% A = \left[ \begin{array}{c} T \\ W \end{array} \right], \quad %% b = \left[ \begin{array}{c} y \\ 0 \end{array} \right] %% f(Ax-b) = f\left(\left[\begin{array}{c} Tx-y \\ Wx \end{array} %% \right] \right) = %% \left[ \begin{array}{c} (Tx-y)^2/2 \\ \lambda\lvert Wx \rvert^\alpha %% \end{array} \right] and calculate the weights: %% \frac{f'(Ax_0-b)}{Ax_0-b} = \frac{\left[ \begin{array}{c} Tx_0-y %% \\ \lambda\alpha(\text{sign}x_0)\lvert Wx_0 \rvert^{\alpha-1} %% \end{array} \right]}{\left[\begin{array}{c} Tx_0-y \\ Wx_0 \end{array} %% \right]} = \left[ \begin{array}{c} \mathbf{1} \\ %% \lambda\alpha\lvert Wx_0 \rvert^{\alpha-2} \end{array} \right] where `%\mathbf{1}%` is a vector consisting of the appropriate number of ones. Writing out the diagonal matrix for the weights: %% D = \left[\begin{array}{cc} I & 0 \\ 0 & \Psi %% \end{array}\right], \quad %% \Psi = %% \text{diag}\left(\lambda\alpha\lvert Wx_0\rvert^{\alpha-2}\right) Remember that at each iteration we solve: %% A^T D A x = A^T D b where `%D%` incorporates information from the previous iteration. In this example, the equation becomes: %% (T^T T + W^T \Psi W)x = T^T y This works well when implemented. Some tips: * Don't raise small or zero numbers to negative powers, set a lower bound (e.g. 1e-4) * If your prior is nonlinear, pay attention to the units of `%x%` * Convergence happens after a small number of iterations, typically fewer than 10 * For image reconstruction with a convex prior (e.g. L1), the result is quite close to the exact global optimum and looks at least as good visually * Use sparse matrices when possible, of course ----- [Back](index.html) [View texdown source](itweightls.text)