An eigenvector of a square matrix A is a non-zero vector v such that
Av=λv
for some scalar λ, called the eigenvalue corresponding to v. The eigendecomposition of A is then given by:
A=Vdiag(λ)V−1
where V is the matrix whose columns are the eigenvectors of A, and diag(λ) is the diagonal matrix of eigenvalues.
Not all matrices have an eigendecomposition. A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Considering only real symmetric matrices, the eigendecomposition always exists, we can decompose it into an expression involving only real-valued eigenvectors and eigenvalues:
A=QΛQT
where Q is an orthogonal matrix whose columns are the eigenvectors of A, and Λ is a diagonal matrix of eigenvalues.
Solving for Eigenvectors and Eigenvalues
Given a square matrix A, we can solve for its eigenvectors and eigenvalues by solving the characteristic equation equation for non-zero v:
Av=λv⟹Av−λv=0⟹(A−λI)v=0
Since we are looking for non-zero v, the matrix A−λI must be singular, giving us the characteristic polynomial:
det(A−λI)=0
Solving this equation gives us the eigenvalues λ of A. Substituting these back into the original equation, we can solve for the eigenvectors v.
Example
Consider the matrix
A=[3012]
The characteristic equation is
det(A−λI)=det[3−λ012−λ]=(3−λ)(2−λ)=0
because the formula for the determinant of a 2×2 matrix is ad−bc. Solving this gives us the eigenvalues λ=3,2. Substituting these back into the original equation, we can solve for the eigenvectors:
A−3I⟹v1=[001−1][v1v2]=[00]=1,v2=0
Matrix Inversion via eigendecomposition
You can compute the inverse of a matrix by computing its eigendecomposition, inverting the eigenvalues, and then transforming back. Inverting the eigenvalues is equivalent to taking the reciprocal of each eigenvalue, which is easy to do for a diagonal matrix.
If a matrix A can be eigendecomposed and if none of its eigenvalues are zero, then A is invertible and its inverse is given by:
A−1=Qdiag(λ−1)Q−1
If A is a symmetric matrix, since Q is formed from the eigenvectors of A, it is orthogonal, therefore Q−1=QT. Futhermore, because λ is a diagonal matrix, its inverse is simply the inverse of its diagonal elements.
[λ−1]ii=λi1
Eigenvalue Facts
The product of the eigenvalues of a matrix is equal to its determinant: det(A)=∏i=1Nλλini, where ni is the multiplicity of the eigenvalue λi.
The sum of the eigenvalues of a matrix is equal to its trace: tr(A)=∑i=1Nλniλi.
If the eigenvalues of A are λi, then the eigenvalues of A−1 are simply λi−1.
Eigenvector Facts
The eigenvectors of A−1 are the same as the eigenvectors of A.
Singular Value Decomposition (SVD)
Since not all matrices have an eigendecomposition, we can use the singular value decomposition (SVD) to decompose any matrix A∈Rm×n into three matrices:
A=UDVT
where:
U∈Rm×m is an orthogonal matrix whose columns are the eigenvectors of AAT, called the left singular vectors
D∈Rm×n is a diagonal matrix whose diagonal elements are the singular values of A. The non-zero singular values are the square roots of the non-zero eigenvalues of ATA or AAT
V∈Rn×n is an orthogonal matrix whose columns are the eigenvectors of ATA called the right singular vectors
The Moore-Penrose Pseudoinverse
Since matrix inversion is not defined for non-square matrices, we can use the SVD to compute the Moore-Penrose pseudoinverse of a matrix A:
A+=VD+UT
where D+ is the pseudoinverse of D, which is obtained by taking the reciprocal of the non-zero singular values and transposing the resulting matrix. A+ minimizes the Frobenius norm of the residual error ∣∣AA+−I∣∣2
For an equation Ax=b, the least squares solution is given by x≈A+b.
Principal Component Analysis (PCA)
PCA is a technique for dimensionality reduction that involves decomposing a dataset into its principal components. Given a dataset X∈Rn×d, the goal is to find the orthogonal matrix W∈Rd×d that projects the data onto a lower-dimensional subspace such that the variance of the projected data is maximized.
learns an orthogonal linear projection that aligns with the directions of maximum variance in the data
Computing PCA
There are several ways to compute Principal Component Analysis, but the most common method is to use the eigendecomposition of the covariance matrix of the data. Given a dataset X∈Rn×d, the steps are as follows:
Center the data: Subtract the mean of each feature from the dataset xc = x - x.mean(dim=0)
Compute the eigenvectors and eigenvalues of C: C=WΛWT
The eigenvectors of C are the principal components of the data, and the eigenvalues are the variances of the data along each principal component.
Interpretation:
Compute the eigendecomposition of the covariance matrix C
Compute the right singular vectors of the centered data matrix X (because the covariance matrix is XTX and the right singular vectors of X are the eigenvectors of XTX)
Least Squares and the Normal Equation
Given a training set, define the design matrixX to be the n×d matrix (actually n×(d+1) if we include the bias term) whose rows are the input vectors xi
X= — (x1)T — — (x2)T — ⋮ — (xn)T —
Also, let y be the n-dimensional vector of target values from the training set, and w be the d-dimensional vector of weights.
y= — (y1) — — (y2) — ⋮ — (yn) —
The least squares objective is to minimize the sum of squared errors:
L(W)=21i=1∑n(yi−wTxi)2=21∣∣y−Xw∣∣2
Finally, to minimize L(W), we first compute the gradient with respect to w: