Review of least-squares polynomial fitting ------------------------------------------ We consider a discrete 1-D "patch" of image intensity, I(x), of size 2w+1, where x is an integer in [-w,w]. From calculus, etc. we know I(x) can be decomposed into the following Taylor series expansion about x=0: I(x) = I(0) + x*I'(0) + x^2/2*I''(0) + ... We'll take the first (n+1) terms of the Taylor series to get an n-th degree approximation of the full series: I(x) = I(0) + x*I'(0) + x^2/2*I''(0) + ... + x^n/n!*I^n(0) In other words, we're approximating the function I(x) using an n-th degree polynomial. The polynomial coefficients are defined by the derivatives of the intensity at the origin, I(0), but we don't actually know these in advance. Fitting them to the data will give us the polynomial we're looking for. [ Note: why not just calculate these derivatives directly from I(x)? Because the image is given at discrete points only, we can't say much about the derivatives without assuming a particular underlying model for smoothness .. but that very model (we assume a polynomial) is what we're trying to fit! ] [ Note: we are re-estimating I(0), even though its value is already given! The curve we fit doesn't necessarily pass through this data point -- rather, we find the best-fit polynomial that minimizes inconsistency with *all* of the data jointly. ] For a given value of x, we can therefore write: I(x) = [1 x 1/2*x^2 ... 1/n!*x^n] * [ I(0) ] [ I'(0) ] ... [ I^n(0) ] But we have 2w+1 different values of x, all with the same RHS. So we can stack them all together into a linear system: I = X * d (2w+1)x1 (2w+1)x(n+1) (n+1)x1 known known unknown This is just a linear system of the form A*x=b. Solving the equation X*d=I in least-squares terms (i.e. using the pseudoinverse or SVD) will give the vector d minimizing ||X*d - I||^2. Example with Gaussian weighting added ------------------------------------- Let's take a concrete example: n = 2 (2nd order approximation, ie. fitting a quadratic) w = 4 (9x1 pixel image patch) In this case X is [ 1 -4 16/2 ] [ 1 -3 9/2 ] [ 1 -2 4/2 ] [ 1 -1 1/2 ] [ 1 0 0 ] [ 1 1 1/2 ] [ 1 2 4/2 ] [ 1 3 9/2 ] [ 1 4 16/2 ] and the unknown d is [ d_0 ] [ d_1 ] [ d_2 ] Suppose that I is [ 2 3 4 1.1 0 0.9 4 5 4 ]^T (draw on coord axes) Note: at the origin it fits I(x)=x^2 almost perfectly, which would correspond to d=[0 0 2]^T. But solving X*d=I takes *all* the data into account, and so gives us something quite different: \hat{d} = [ 1.7576 ] [ 0.2300 ] [ 0.2727 ] (sketch it, slight curve closer to a horizontal line) Now consider a weighted version of this, to favor the center pixels more. Define the weights according to: \Omega(x) = exp(-x^2) (un-normalized Gaussian, centered at origin) For this example, x=[-4,4] and \Omega(x) is [0.0000 0.0001 0.0183 0.3679 1.0000 0.3679 0.0183 0.0001 0.0000]^T The extreme values aren't zero, just very small! Then the new linear system we need to solve is: W*I = W*X*d where W is the (2*w+1)x(2*w+1) diagonal matrix formed by placing all the weights, [\Omega(-w) \Omega(-w+1) ... \Omega(w)], along the diagonal. The least-squares solution to this (weighted) system will give the vector d minimizing the norm ||W*I - W*X*d||^2 = ||W*(I-X*d)||^2. So the W matrix just has the effect of scaling the regular error, ie. the difference between the image I and the best-fit polynomial reconstruction X*d, differently for each pixel. Pixels with higher weight (for this example, pixels near the center) will contribute relatively more to the total error. For this example, solving the weighted equation gives: \hat{d} = [ 0.0000 ] [ -0.0990 ] [ 2.0000 ] which is very close to I(x)=x^2, just like we'd expect! More generally ... ------------------ Minimizing a quadratic error functional leads to a linear system of equations. We sketched out the proof of this earlier using matrices. But here's a more basic derivation ... let's start from a completely *general* quadratic function defined for an n-dimensional vector x: E(x) = x^T*A*x + b^T*x + c = \sum_i \sum_j A_{ij}*x_i*x_j + \sum_i b_i*x_i + c Note how this function includes all the possible quadratic terms (x_i^2), cross terms (x_i*x_j, for i \neq j), and linear terms (x_i), as well as a constant term. Differentiating with respect to a particular component x_k, we obtain dE(x)/dx_k = \sum_i [ A_{ik}*x_i ] + \sum_i [ A_{ki}*x_i ] + b_k The terms correspond to summing along the rows of A, summing along the columns of A, and selecting an element of b. Verify that this indeed does the right thing for the x_k^2 term! The derivative will be zero at extreme values, and a quadratic function only has one such value. Therefore we can find the extremum by solving: dE(x)/dx_k = 0 [ A_{1k} ... A{nk} ]*x + [ A_{k1} ... A{kn} ]*x + b_k = 0 ([ A_{1k} ... A{nk} ] + [ A_{k1} ... A{kn} ])*x = -b_k Which is a simple linear equation in x. But the whole argument above applies to any component x_k, so we get n different linear equations, one per component of x. This is an (n x n) linear system, which can be written simply as: (A + A^T)*x = -b When the quadratic function satisfies E(x) >= 0, eg. when it is an error function of the form E(x) = || x - y ||^2, the extremum we get from solving the linear system above will always be a *minimum* (can you see why?). APPENDIX: Matlab source code ---------------------------- xx = [-4:4] X = [xx.^0 ; xx.^1 ; xx.^2/2]' I = [2 3 4 1.1 0 0.9 4 5 4]' d1 = X\I Omega = exp(-xx.^2) W = diag(Omega) d2 = (W*X)\(W*I) xx_fine = linspace(-4,4,100); X_fine = [xx_fine.^0 ; xx_fine.^1 ; xx_fine.^2/2]'; Omega_fine = exp(-xx_fine.^2); figure(1); clf; hold on; h=plot(xx,I,'.-'); set(h,'MarkerSize',20) plot(xx_fine,X_fine*d1,'r-') plot(xx_fine,9*Omega_fine,'k--') plot(xx_fine,X_fine*d2,'g-') axis([-4 4 0 10]) legend('1D image patch','quadratic fit','weighting function, \Omega','weighted quadratic fit') print -dpng polyfit