DERIVATION 1: PRINCIPAL COMPONENT ANALYSIS (PCA)
------------------------------------------------

(This basically follows the derivation on slides 21-24 of lecture 16)

Z = [Z_1 Z_2 ... Z_M] (M x M)
      contains the mean-subtracted faces, in its cols
B = [B_1 B_2 ... B_M] (M x M)
      contains the basis for the face data, in its cols

As claimed in lecture:
The "optimal" basis B for PCA can be expressed as Z = B*Y,
ie. the mean-subtracted data can be reconstructed in terms of 
some linear transformation of the new basis, but where the rows 
of Y have covariance of zero, ie. Y*Y^T is diagonal.

Define A = B^{-1}, so we have Y = A*Z. Then
  Y*Y^T = (A*Z)*(A*Z)^T
        = A*Z*Z^T*A^T     [transpose rule]
        = A*(Z*Z^T)*A^T   [regroup]
So we want Y*Y^T = A*(Z*Z^T)*A^T to be diagonal.

Examine the eigenvector decomposition of (Z*Z^T):
  (Z*Z^T) * e_i           = e_i * \lambda_i                                [def]
  (Z*Z^T) * [e_1 ... e_M] = [e_1 ... e_M] * diag(\lambda_1,...,\lambda_M)  [stack the eqns]
  (Z*Z^T) * E             = E*L                                            [matrix notation]

Left-multiply this by E^T:
  E^T * (Z*Z^T)*E  = E^T * E*L
                   = (E^T*E) * L  [regroup]
                   = I * L        [eigenvecs are orthogonal, E*E^T = E^T*E = I]
                   = L            [eigenval matrix L is diagonal]

Observe that setting A=E^T gives us
  E^T*(Z*Z^T)*E = A*(Z*Z^T)*A^T  [A = E^T]
                = (A*Z)*(A*Z)^T  [transpose rule]
                = Y*Y^T          [Y = A*Z = E^T*Z]
                = L              [see above, note L is diagonal]
and so satisfies our requirement that Y*Y^T be diagonal.

Therefore, we can compute B, an "optimal" basis for Z in
terms of the eigenvectors for (Z*Z^T), E. In particular:
   B = A^{-1}        [def]
     = (E^T)^{-1}    [A = E^T]
     = E             [eigenvecs are orthogonal, E*E^T = E^T*E = I]


DERIVATION 2: PRACTICAL PCA, USING Z^T*Z INSTEAD OF Z*Z^T
---------------------------------------------------------

Remember that Z is (M x N), with M >> N 
  eg. M = 75,000  pixels per face
      N = 200     faces in the database

Since Z*Z^T is a huge matrix (eg. 75,000 x 75,000 for normal 
images), in practice we do the eigenvector computation by 
computing the eigenvectors of Z^T*Z (eg. 200 x 200) instead, and 
then computing the top N << M eigenvectors of Z*Z^T as linear 
combinations of those eigenvectors.

[ Also see the Matlab script recognition_demo.zip which implements 
  this step.. but there aren't many comments in the code. ]

Definition of eigenvectors for the *much* smaller Z^T*Z (N x N):
  (Z^T*Z) * e_i           = \lambda_i * e_i                                [def]
  (Z^T*Z) * [e_1 ... e_N] = [e_1 ... e_N] * diag(\lambda_1,...,\lambda_N)  [stack the eqns]
  (Z^T*Z) * E             = E*L                                            [matrix notation]

Note: E and L are also (N x N)

Now a little algebra ...
  Z*(Z^T*Z)*E      = Z*E*L         [left multiply by Z]
  (Z*Z^T) * (Z*E)  = (Z*E) * L     [regroup]

Define e'_i = Z*e_i, and E'=Z*E (M x N), and observe how this form 
gives us N eigenvector equations for the *huge* matrix Z*Z^T (M x M):
  (Z*Z^T) * E'              = E' * L                                           [E'=Z*E]
  (Z*Z^T) * [e'_1 ... e'_N] = [e'_1 ... e'_N] * diag(\lambda_1,...,\lambda_N)  [matrix notation]
  (Z*Z^T) * e'_i            = \lambda_i * e'_i                                 [unstack eqns]

So to obtain N eigenvectors, E', for Z*Z^T, simply find the eigenvectors, 
E, for the *much* smaller Z^T*Z (all N of them), and then transform them 
according to E'=Z*E.

This smaller set of N (<< M) eigenvectors actually covers all 
the "interesting"/non-degenerate eigenvectors available. 

Why? 
We know that rank(Z) <= N = min(M,N), but it's also the case that 
rank(Z*Z^T) = rank(Z^T*Z) = rank(Z). So this tells us that the 
eigenvector decomposition for Z*Z^T can have at most N non-zero 
\lambda_i's, ie. we've covered them all.