The Intuitive Guide to Singular Value Decomposition
Forget memorizing formulas. Let's discover SVD from the ground up, as if we were inventing it ourselves.
The Quest: What Does a Matrix Really Do?
Imagine you have a matrix, let's call it A. We know it's a collection of numbers, and we know it can 'act' on a vector to produce a new vector ($$Ax = b$$). But what is the fundamental action of this transformation? Is it a stretch? A squish? A rotation? A flip? Usually, it's a messy combination of all of these.
Our goal is to find a way to describe this messy action as a sequence of three clean, simple, and fundamental operations. This is the core motivation behind Singular Value Decomposition (SVD). We want to decompose the complex action of A into:
- Step 1: A pure rotation (and maybe a flip).
- Step 2: A pure scaling along the principal axes.
- Step 3: Another pure rotation.
Analogy: The Master Chef's Recipe
Think of a matrix A as a complex cooking recipe. Applying it to ingredients (a vector) gives you a dish. SVD is like breaking down that complex recipe into three universal cooking steps:
- 1. Preparation (Rotate): First, you arrange your ingredients on the cutting board in a very specific, optimal orientation.
- 2. Cooking (Scale): Then, you apply a simple action: stretch some ingredients, shrink others. You don't change their orientation, you just change their size.
- 3. Plating (Rotate): Finally, you arrange the cooked ingredients on the plate in their final beautiful presentation.
Any complex recipe (any matrix) can be described by these three simple steps. SVD finds the exact 'preparation' rotation, the 'cooking' scalings, and the 'plating' rotation for any given matrix.
A Geometric Adventure: From Circle to Ellipse
Let's get visual. A linear transformation, like our matrix A, maps points from an input space to an output space. A beautiful way to see what it does is to take all the vectors of length 1 in the input space (which form a unit circle), and see where A sends them. It turns out, A will always transform that unit circle into an ellipse (or a line segment if the matrix collapses a dimension).
This gives us our first big clue! An ellipse has a major axis (its longest radius) and a minor axis (its shortest radius). These axes are perpendicular to each other.
- The directions of the ellipse's axes in the output space are our final orientation. Let's call them $$u_1, u_2, ...$$
- The lengths of these axes are our scaling factors. Let's call them $$σ_1, σ_2, ...$$ These are the famous singular values.
- There must have been some original vectors in the input unit circle that became these axes. These original vectors were also perpendicular! Let's call these special input directions $$v_1, v_2, ...$$
Key Insight: The Orthogonal-to-Orthogonal Mapping
SVD is all about finding a set of special, orthogonal (perpendicular) input vectors V that get mapped by matrix A to another set of orthogonal output vectors U, scaled by the singular values Σ.
$$ A v_i = \sigma_i u_i $$
This equation is the heart of SVD. It says: "Taking the special input direction $$v_i$$, applying the transformation A, is the same as taking the special output direction $$u_i$$ and just stretching it by $$σ_i$$!"
Inventing the Method: How to Find V, Σ, and U
Okay, we have our goal. But how do we find these magical vectors and scaling factors for an arbitrary matrix A? We need an algebraic method.
Step 1: Finding the Input Rotations (V) and Scalings (Σ)
We're looking for something that depends only on the input space and the scaling, not the final rotation. The problem is that A mixes scaling and rotating. How can we isolate the scaling and input rotation? We need to cancel out that final rotation (the 'plating' step, U).
Here's the trick: consider the matrix $$A^T A$$. Let's see what happens if we apply it to one of our special input vectors, $$v_i$$:
$$ A^T A v_i = A^T (\sigma_i u_i) = \sigma_i (A^T u_i) $$This doesn't seem helpful yet. But what is $$A^T$$? If $$A = U \Sigma V^T$$, then $$A^T = (U \Sigma V^T)^T = V \Sigma^T U^T = V \Sigma U^T$$ (since Σ is diagonal). So, $$A^T u_i = \sigma_i v_i$$. Let's plug that back in:
$$ A^T A v_i = \sigma_i (A^T u_i) = \sigma_i (\sigma_i v_i) = \sigma_i^2 v_i $$Let's rewrite that and stare at it:
$$ (A^T A) v_i = \sigma_i^2 v_i $$
This is incredible! This is the standard eigenvector-eigenvalue equation! This tells us everything we need:
- The special input vectors we were looking for, the columns of V, are simply the eigenvectors of the symmetric matrix $$A^T A$$.
- The squared scaling factors, $$σ_i^2$$, are the eigenvalues of $$A^T A$$.
- Therefore, our singular values ($$σ_i$$) are the square roots of the eigenvalues of $$A^T A$$.
Step 2: Finding the Output Rotations (U)
This is the easy part! We already have the core relationship $$A v_i = \sigma_i u_i$$. We've just found $$v_i$$ and $$σ_i$$. We can just rearrange it to find our output vectors $$u_i$$:
$$ u_i = \frac{1}{\sigma_i} A v_i $$
So, for every non-zero singular value, we can find the corresponding output direction vector. These vectors will form the columns of our matrix U.
Step 3: Assembling the Decomposition
Now we have all the pieces. We arrange them into matrices:
- U: A matrix whose columns are the output vectors $$u_i$$. This matrix represents the final rotation.
- Σ (Sigma): A diagonal matrix containing the singular values $$σ_i$$ in descending order. This matrix represents the pure scaling.
- V: A matrix whose columns are the input vectors $$v_i$$. Its transpose, $$V^T$$, represents the initial rotation.
This gives us the final, celebrated formula for Singular Value Decomposition:
The SVD Formula: A = UΣVᵀ
This formula states that any matrix A can be factored into an orthogonal matrix U, a diagonal matrix Σ, and the transpose of an orthogonal matrix V.
- $$V^T$$: Rotates the input space so the basis vectors align with the principal axes $$v_i$$.
- $$ \Sigma $$: Scales the rotated vectors along each axis by the singular values $$σ_i$$.
- $$ U $$: Rotates the scaled vectors from the $$u_i$$ basis into the final output space.
Solved Problems: Let's Get Our Hands Dirty
Theory is great, but the only way to truly understand SVD is to do it. Let's solve a few problems step-by-step.
Problem 1: A Simple 2x2 Matrix
Find the SVD of $$ A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} $$.
Step 1: Calculate $$A^T A$$ to find V and Σ.
$$ A^T = \begin{pmatrix} 3 & 4 \\ 0 & 5 \end{pmatrix} $$ $$ A^T A = \begin{pmatrix} 3 & 4 \\ 0 & 5 \end{pmatrix} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} = \begin{pmatrix} 25 & 20 \\ 20 & 25 \end{pmatrix} $$Step 2: Find eigenvalues and eigenvectors of $$A^T A$$.
The characteristic equation is $$ (25-\lambda)^2 - 20^2 = 0 $$, which gives $$ (25-\lambda-20)(25-\lambda+20) = 0 $$, so $$ (5-\lambda)(45-\lambda) = 0 $$.
The eigenvalues are $$λ_1 = 45$$ and $$λ_2 = 5$$.
The singular values are the square roots: $$σ_1 = \sqrt{45} = 3\sqrt{5}$$ and $$σ_2 = \sqrt{5}$$. So, $$ \Sigma = \begin{pmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{pmatrix} $$.
For $$λ_1=45$$, the eigenvector is $$ v_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} $$. For $$λ_2=5$$, the eigenvector is $$ v_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} -1 \\ 1 \end{pmatrix} $$.
So, $$ V = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} $$.
Step 3: Calculate U using $$ u_i = \frac{1}{\sigma_i} A v_i $$.
$$ u_1 = \frac{1}{3\sqrt{5}} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \frac{1}{3\sqrt{10}} \begin{pmatrix} 3 \\ 9 \end{pmatrix} = \frac{1}{\sqrt{10}} \begin{pmatrix} 1 \\ 3 \end{pmatrix} $$ $$ u_2 = \frac{1}{\sqrt{5}} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} -1 \\ 1 \end{pmatrix} = \frac{1}{\sqrt{10}} \begin{pmatrix} -3 \\ 1 \end{pmatrix} $$So, $$ U = \frac{1}{\sqrt{10}} \begin{pmatrix} 1 & -3 \\ 3 & 1 \end{pmatrix} $$.
Final Result:
$$ A = \underbrace{\frac{1}{\sqrt{10}} \begin{pmatrix} 1 & -3 \\ 3 & 1 \end{pmatrix}}_U \underbrace{\begin{pmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{pmatrix}}_\Sigma \underbrace{\frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix}}_{V^T} $$Problem 2: A Non-Square (3x2) Matrix
Find the SVD of $$ A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix} $$. This matrix maps from 2D to 3D.
$$ A^T A = \begin{pmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} $$
Eigenvalues of $$A^T A$$ are $$λ_1 = 3, λ_2 = 1$$. So, singular values are $$σ_1 = \sqrt{3}, σ_2 = 1$$.
Eigenvectors are $$ v_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} $$ and $$ v_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix} $$. So, $$ V = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} $$.
Now find U. $$ u_1 = \frac{1}{\sqrt{3}} A v_1 = \frac{1}{\sqrt{3}} \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \frac{1}{\sqrt{6}} \begin{pmatrix} 2 \\ 1 \\ 1 \end{pmatrix} $$.
$$ u_2 = \frac{1}{1} A v_2 = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix} = \frac{1}{\sqrt{2}} \begin{pmatrix} 0 \\ -1 \\ 1 \end{pmatrix} $$.
U needs to be 3x3. We have two columns, $$u_1$$ and $$u_2$$. We need a third vector $$u_3$$ orthogonal to both. We can find it with the cross product: $$ u_3' = u_1 \times u_2 $$ (after scaling them for simplicity) or by solving for a vector in the null space of $$[u_1 u_2]^T$$. A valid vector is $$ u_3 = \frac{1}{\sqrt{3}} \begin{pmatrix} -1 \\ 1 \\ 1 \end{pmatrix} $$.
Final Result:
$$ A = \underbrace{\begin{pmatrix} 2/\sqrt{6} & 0 & -1/\sqrt{3} \\ 1/\sqrt{6} & -1/\sqrt{2} & 1/\sqrt{3} \\ 1/\sqrt{6} & 1/\sqrt{2} & 1/\sqrt{3} \end{pmatrix}}_U \underbrace{\begin{pmatrix} \sqrt{3} & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}}_\Sigma \underbrace{\frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}}_{V^T} $$Problem 3: A Rank-Deficient Matrix
Find the SVD of $$ A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} $$. The columns are dependent, so the rank is 1.
$$ A^T A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix} $$
Eigenvalues of $$A^T A$$ are $$λ_1 = 4, λ_2 = 0$$. Singular values are $$σ_1 = 2, σ_2 = 0$$. The zero singular value confirms the rank is 1!
Eigenvectors are $$ v_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} $$ and $$ v_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix} $$.
Now find U. $$ u_1 = \frac{1}{2} A v_1 = \frac{1}{2} \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \frac{1}{2\sqrt{2}} \begin{pmatrix} 2 \\ 2 \end{pmatrix} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix} $$.
For $$σ_2=0$$, we can't divide. We just need a vector orthogonal to $$u_1$$. Let's pick $$ u_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix} $$.
Final Result:
$$ A = \underbrace{\frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}}_U \underbrace{\begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}}_\Sigma \underbrace{\frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}}_{V^T} $$Problem 4: Matrix Approximation (Conceptual)
Let's use the result from Problem 1. How can we find the best rank-1 approximation of $$ A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} $$?
The Eckart-Young theorem states that the best rank-k approximation of a matrix is found by keeping the k largest singular values in Σ and setting the rest to zero.
From Problem 1, we have $$ \sigma_1 = 3\sqrt{5} $$ and $$ \sigma_2 = \sqrt{5} $$. For a rank-1 approximation, we keep only $$σ_1$$ and set $$σ_2=0$$.
$$ \Sigma' = \begin{pmatrix} 3\sqrt{5} & 0 \\ 0 & 0 \end{pmatrix} $$The rank-1 approximation $$ A_1 $$ is $$ U \Sigma' V^T $$:
$$ A_1 = \frac{1}{\sqrt{10}} \begin{pmatrix} 1 & -3 \\ 3 & 1 \end{pmatrix} \begin{pmatrix} 3\sqrt{5} & 0 \\ 0 & 0 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix} $$ $$ A_1 = \frac{3\sqrt{5}}{\sqrt{20}} \begin{pmatrix} 1 \\ 3 \end{pmatrix} \begin{pmatrix} 1 & 1 \end{pmatrix} = \frac{3\sqrt{5}}{2\sqrt{5}} \begin{pmatrix} 1 & 1 \\ 3 & 3 \end{pmatrix} = \begin{pmatrix} 1.5 & 1.5 \\ 4.5 & 4.5 \end{pmatrix} $$This matrix $$A_1$$ is the closest rank-1 matrix to the original matrix A. This principle is the foundation of data compression techniques like PCA and image compression.
Problem 5: Finding the Pseudoinverse
The pseudoinverse $$ A^+ $$ is a generalization of the matrix inverse. It's incredibly useful for solving systems of linear equations that don't have a unique solution. SVD gives us a direct way to calculate it: $$ A^+ = V \Sigma^+ U^T $$, where $$ \Sigma^+ $$ is formed by taking the reciprocal of the non-zero singular values in Σ and leaving the zeros.
Let's find the pseudoinverse of the rank-1 matrix from Problem 3: $$ A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} $$.
We had: $$ U = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} $$, $$ \Sigma = \begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix} $$, $$ V^T = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} $$.
First, find $$ \Sigma^+ $$. We take the reciprocal of non-zero elements: $$ \Sigma^+ = \begin{pmatrix} 1/2 & 0 \\ 0 & 0 \end{pmatrix} $$.
Now, calculate $$ A^+ = V \Sigma^+ U^T $$. Note that $$V=U$$ in this specific case.
$$ A^+ = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} \begin{pmatrix} 1/2 & 0 \\ 0 & 0 \end{pmatrix} \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} $$ $$ A^+ = \frac{1}{2} \begin{pmatrix} 1/2 & 0 \\ 1/2 & 0 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{pmatrix} = \begin{pmatrix} 1/4 & 1/4 \\ 1/4 & 1/4 \end{pmatrix} $$This is the Moore-Penrose pseudoinverse of A.
Conclusion: You've Reinvented SVD
You didn't just learn a formula. You started with a fundamental question—"How can we simplify the action of a matrix?"—and through logic and geometric intuition, you arrived at the answer. You discovered that every linear transformation can be broken down into a rotation, a scaling, and another rotation.
You figured out that the key was the $$ A^T A $$ trick, which isolates the input rotation and scaling factors. From there, finding the output rotation was a simple final step. This is Singular Value Decomposition. It's not a random, complicated formula; it's the natural, elegant, and fundamental description of what matrices do. From data compression to solving impossible equations, SVD's power comes from this profound simplicity.
Take a Quiz Based on This Article
Test your understanding with AI-generated questions tailored to this content