DERIVATION 1: PRINCIPAL COMPONENT ANALYSIS (PCA) ------------------------------------------------ (This basically follows the derivation on slides 21-24 of lecture 16) Z = [Z_1 Z_2 ... Z_M] (M x M) contains the mean-subtracted faces, in its cols B = [B_1 B_2 ... B_M] (M x M) contains the basis for the face data, in its cols As claimed in lecture: The "optimal" basis B for PCA can be expressed as Z = B*Y, ie. the mean-subtracted data can be reconstructed in terms of some linear transformation of the new basis, but where the rows of Y have covariance of zero, ie. Y*Y^T is diagonal. Define A = B^{-1}, so we have Y = A*Z. Then Y*Y^T = (A*Z)*(A*Z)^T = A*Z*Z^T*A^T [transpose rule] = A*(Z*Z^T)*A^T [regroup] So we want Y*Y^T = A*(Z*Z^T)*A^T to be diagonal. Examine the eigenvector decomposition of (Z*Z^T): (Z*Z^T) * e_i = e_i * \lambda_i [def] (Z*Z^T) * [e_1 ... e_M] = [e_1 ... e_M] * diag(\lambda_1,...,\lambda_M) [stack the eqns] (Z*Z^T) * E = E*L [matrix notation] Left-multiply this by E^T: E^T * (Z*Z^T)*E = E^T * E*L = (E^T*E) * L [regroup] = I * L [eigenvecs are orthogonal, E*E^T = E^T*E = I] = L [eigenval matrix L is diagonal] Observe that setting A=E^T gives us E^T*(Z*Z^T)*E = A*(Z*Z^T)*A^T [A = E^T] = (A*Z)*(A*Z)^T [transpose rule] = Y*Y^T [Y = A*Z = E^T*Z] = L [see above, note L is diagonal] and so satisfies our requirement that Y*Y^T be diagonal. Therefore, we can compute B, an "optimal" basis for Z in terms of the eigenvectors for (Z*Z^T), E. In particular: B = A^{-1} [def] = (E^T)^{-1} [A = E^T] = E [eigenvecs are orthogonal, E*E^T = E^T*E = I] DERIVATION 2: PRACTICAL PCA, USING Z^T*Z INSTEAD OF Z*Z^T --------------------------------------------------------- Remember that Z is (M x N), with M >> N eg. M = 75,000 pixels per face N = 200 faces in the database Since Z*Z^T is a huge matrix (eg. 75,000 x 75,000 for normal images), in practice we do the eigenvector computation by computing the eigenvectors of Z^T*Z (eg. 200 x 200) instead, and then computing the top N << M eigenvectors of Z*Z^T as linear combinations of those eigenvectors. [ Also see the Matlab script recognition_demo.zip which implements this step.. but there aren't many comments in the code. ] Definition of eigenvectors for the *much* smaller Z^T*Z (N x N): (Z^T*Z) * e_i = \lambda_i * e_i [def] (Z^T*Z) * [e_1 ... e_N] = [e_1 ... e_N] * diag(\lambda_1,...,\lambda_N) [stack the eqns] (Z^T*Z) * E = E*L [matrix notation] Note: E and L are also (N x N) Now a little algebra ... Z*(Z^T*Z)*E = Z*E*L [left multiply by Z] (Z*Z^T) * (Z*E) = (Z*E) * L [regroup] Define e'_i = Z*e_i, and E'=Z*E (M x N), and observe how this form gives us N eigenvector equations for the *huge* matrix Z*Z^T (M x M): (Z*Z^T) * E' = E' * L [E'=Z*E] (Z*Z^T) * [e'_1 ... e'_N] = [e'_1 ... e'_N] * diag(\lambda_1,...,\lambda_N) [matrix notation] (Z*Z^T) * e'_i = \lambda_i * e'_i [unstack eqns] So to obtain N eigenvectors, E', for Z*Z^T, simply find the eigenvectors, E, for the *much* smaller Z^T*Z (all N of them), and then transform them according to E'=Z*E. This smaller set of N (<< M) eigenvectors actually covers all the "interesting"/non-degenerate eigenvectors available. Why? We know that rank(Z) <= N = min(M,N), but it's also the case that rank(Z*Z^T) = rank(Z^T*Z) = rank(Z). So this tells us that the eigenvector decomposition for Z*Z^T can have at most N non-zero \lambda_i's, ie. we've covered them all.