Introdunction to Linear Algebra-第7章-The Singular Value Decomposition (SVD)

Introdunction to Linear Algebra-第7章-The Singular Value Decomposition (SVD)

[TOC]
《Introdunction to Linear Algebra》的第 7 章: The Singular Value Decomposition (SVD).

7 .1 Image Processing by Linear Algebra

本节概要

  1. An image is a large matrix of grayscale values, one for each pixel and color.
  2. When nearby pixels are correlated (not random) the image can be compressed.
  3. The SVD separates any matrix $A$ into rank one pieces $ \boldsymbol{u} \boldsymbol{v}^T = $(column)(row).
  4. The columns and rows are eigenvectors of symmetric matrices $AA^T$ and $A^T A$.

用 2 句话概括本节主题:
The singular value theorem for $A$ is the eigenvalue theorem for $A^T A$ and $AA^T$.
The Singular Value Decomposition (SVD) separates any matrix into simple pieces.(也就是从 $m \times n$ 变为 $m + n$).
应用举例是图像处理.

Low Rank Images (Examples)

先从低秩的矩阵开始引入, 我们希望通过矩阵分解成 (column)(row) 的形式以减少数据传输量.
对于 rank 1, 例如下面的理想情况:

$$
\text { Don’t send } A=\left[\begin{array}{llllll}
1 & 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 1 & 1
\end{array}\right] \quad \text { Send this } A=\left[\begin{array}{l}
1 \\
1 \\
1 \\
1 \\
1 \\
1
\end{array}\right]\left[\begin{array}{lllll}
1 & 1 & 1 & 1 & 1 & 1
\end{array}\right]
$$

如果左侧的列向量可以提前定义的话, 传输量(压缩倍率)又可以提升了, 这就是区别: preselected bases (the Fourier basis allows speed-up from the FFT) and adaptive bases determined by the image.

另一个 rank=1 的例子:
$$
\text { Don’t send } A=\left[\begin{array}{llllll}
a & a & c & c & e & e \\
a & a & c & c & e & e \\
a & a & c & c & e & e \\
a & a & c & c & e & e \\
a & a & c & c & e & e \\
a & a & c & c & e & e
\end{array}\right] \quad \text { Send } A=\left[\begin{array}{l}
1 \\
1 \\
1 \\
1 \\
1 \\
1
\end{array}\right]\left[\begin{array}{llllll}
a & a & c & c & e & e
\end{array}\right]
$$

对于 rank=2:

$$
A=\left[\begin{array}{ll}
1 & 0 \\
1 & 1
\end{array}\right] \text { is equal to } A=\left[\begin{array}{l}
1 \\
1
\end{array}\right]\left[\begin{array}{ll}
1 & 1
\end{array}\right]-\left[\begin{array}{l}
1 \\
0
\end{array}\right]\left[\begin{array}{ll}
0 & 1
\end{array}\right]
$$

$A=\boldsymbol{u_1} \boldsymbol{v_1}^T + \boldsymbol{u_2} \boldsymbol{v_2}^T$.

上面的分解不完全是 SVD 分解, SVD 分解有 orthogonal 的特点.

Eigenvectors for the SVD

直接使用矩阵的 eigenvectors 的缺点:

  1. 不 orthogonal
  2. give only one set of vectors(仅列向量), and we want two sets ($\boldsymbol{u}$’s and $\boldsymbol{v}$’s, 即行+列).
  3. $A \boldsymbol{x} = \lambda \boldsymbol{x}$ requires $A$ to be a square matrix.

转变思维: Use the eigenvectors $\boldsymbol{u}$ of $AA^T$ and the eigenvectors $\boldsymbol{v}$ of $A^TA$.

但是我们还会遇到一个问题, 对于秩尤其是现实生活中的图片基本上都是满秩的, 分解后的 $\boldsymbol{u},\boldsymbol{v}$ 很多, 导致反而多做了无用功. 接下来的思路就是忽略变化小的 $\boldsymbol{u},\boldsymbol{v}$,只保留大的(用图像压缩来举例就是只保留不会较大影响视觉效果的 $\boldsymbol{u},\boldsymbol{v}$, 原文: Visual quality can be preserved even with a big reduction in the rank). 衡量影响大小的因素就是其长度:
$A = \sigma_1 \boldsymbol{u_1} \boldsymbol{v_1} + \sigma_2 \boldsymbol{u_2} \boldsymbol{v_2}$.

We keep larger $\sigma_1$’s, we discard small $\sigma_2$’s.

总结起来:

The $\boldsymbol{\boldsymbol{u}}$ ‘s from the SVD are called left singular vectors (unit eigenvectors of $A A^{\mathrm{T}}$ ). The $\boldsymbol{\boldsymbol{v}}$ ‘s are right singular vectors (unit eigenvectors of $A^{\mathrm{T}} A$ ). The $\sigma$ ‘s are singular values, square roots of the equal eigenvalues of $A A^{\mathrm{T}}$ and $A^{\mathrm{T}} A$​ :

$$ A A^{\mathrm{T}} \boldsymbol{\boldsymbol{u}}_{i}=\sigma_{i}^{2} \boldsymbol{\boldsymbol{u}}_{i} \quad A^{\mathrm{T}} A \boldsymbol{\boldsymbol{v}}_{i}=\sigma_{i}^{2} \boldsymbol{\boldsymbol{v}}_{i} \quad A \boldsymbol{\boldsymbol{v}}_{i}=\sigma_{i} \boldsymbol{\boldsymbol{u}}_{i} $$

7.2 Bases and Matrices in the SVD

本节概要

  1. The SVD produces orthonormal basis of $\boldsymbol{v}$ ‘s and $\boldsymbol{u}$’s for the four fundamental subspaces.
  2. Using those bases, $A$ becomes a diagonal matrix $\Sigma$ and $A \boldsymbol{v}{i}=\sigma{i} \boldsymbol{u}{i}: \sigma{i}=$​ singular value.
  3. The two-bases diagonalization $A = U \Sigma V^{\mathbf{T}}$​ often has more information than $A=X \Lambda X^{-1}$​.
  4. $U \Sigma V^{\mathrm{T}}$​ separates $A$​ into rank-1 matrices $\sigma_{1} \boldsymbol{\boldsymbol{u}}_{1} \boldsymbol{\boldsymbol{v}}_{1}^{\mathrm{T}}+\boldsymbol{\cdots}+\sigma_{r} \boldsymbol{\boldsymbol{u}}_{r} \boldsymbol{\boldsymbol{v}}_{r}^{\mathrm{T}} . \quad \sigma_{1} \boldsymbol{\boldsymbol{u}}_{1} \boldsymbol{\boldsymbol{v}}_{1}^{\mathrm{T}}$​ is the largest!

原书采用了一步一步复杂化的介绍方式:

  1. 向量式
    $\boldsymbol{u}$’s are in $\textbf{R}^m$ and the $\boldsymbol{v}$’s are in $\textbf{R}^n$. They will be the columns of an $m \times m$ matrix $U$ and an $n \times n$ matrix $V$.
    The $\boldsymbol{u}$’s and $\boldsymbol{v}$’s give bases for the four fundamental subspaces:
$$ \begin{array}{ll} \boldsymbol{\boldsymbol{u}}_{1}, \ldots, \boldsymbol{\boldsymbol{u}}_{r} & \text { is an orthonormal basis for the column space } \\\\ \boldsymbol{\boldsymbol{u}}_{r+1}, \ldots, \boldsymbol{\boldsymbol{u}}_{m} & \text { is an orthonormal basis for the left nullspace } \boldsymbol{N}\left(A^{\mathrm{T}}\right) \\\\ \boldsymbol{\boldsymbol{v}}_{1}, \ldots, \boldsymbol{\boldsymbol{v}}_{r} & \text { is an orthonormal basis for the row space } \\\\ \boldsymbol{\boldsymbol{v}}_{r+1}, \ldots, \boldsymbol{\boldsymbol{v}}_{n} & \text { is an orthonormal basis for the nullspace } \boldsymbol{N}(A) \end{array} $$
  1. 缩略版矩阵
    Since the $\boldsymbol{u}$’s are orthonormal, the matrix $U_r$ with those $r$ columns has $U_r^TU_r = I$. Since the $\boldsymbol{v}$’s are orthonormal, the matrix $V_r$ has $V_r^TV_r = I$. Then the equations $AV_r = U_r \varSigma_r$:
$$ \begin{gathered} (m \text { by } n)(n \text { by } r) \\\\ A V_{\boldsymbol{r}} = U_{\boldsymbol{r}} \Sigma_{\boldsymbol{r}} \\\\ (m \text { by } r)(r \text { by } r) \end{gathered} \quad A\left[\begin{array}{ll} & & \\\\ \boldsymbol{\boldsymbol{v}}_{1} & \cdot \cdot & \boldsymbol{\boldsymbol{v}}_{r} \\\\ & & \end{array}\right]=\left[\begin{array}{lll} & & \\\\ \boldsymbol{\boldsymbol{u}}_{1} & \cdot \cdot & \boldsymbol{\boldsymbol{u}}_{r} \\\\ & & \end{array}\right]\left[\begin{array}{lll} \sigma_{1} & & &\\\\ & \cdot & & \\\\ & & \cdot & \\\\ & & & \sigma_{r} \end{array}\right] $$
  1. 完整版矩阵
$$ \begin{gathered} (m \text { by } n)(n \text { by } n) \\\\ A V = U \Sigma \\\\ (m \text { by } n)(r \text { by } n) \end{gathered} \quad A\left[\begin{array}{ll} & & \\\\ \boldsymbol{\boldsymbol{v}}_{1} & \cdot \cdot & \boldsymbol{\boldsymbol{v}}_{r} & \cdot \cdot & \boldsymbol{\boldsymbol{v}}_{n}\\\\ & & \end{array}\right]=\left[\begin{array}{lll} & & \\\\ \boldsymbol{\boldsymbol{u}}_{1} & \cdot \cdot & \boldsymbol{\boldsymbol{u}}_{r} & \cdot \cdot & \boldsymbol{\boldsymbol{u}}_{n}\\\\ & & \end{array}\right]\left[\begin{array}{lll} \sigma_{1} & & & &\\\\ & \cdot & & &\\\\ & & \cdot & &\\\\ & & & \sigma_{r} &\\\\ & & & & & \end{array}\right] $$

注意:$ V^{- 1} = V^T$
Singular Value Decomposition 的一般形式:

$$ A=U \Sigma V^{\mathrm{T}}=\boldsymbol{u}_{1} \sigma_{1} \boldsymbol{v}_{1}^{\mathrm{T}}+\cdots+\boldsymbol{u}_{r} \sigma_{r} \boldsymbol{v}_{r}^{\mathrm{T}} $$
  1. 排序版
    gives the $r$ rank-one pieces of $A$ in order of importance. 排序依据: $\sigma_1 \geq \sigma_2 … \leq \sigma_r > 0$.

singular values VS eigenvalues
When is $A= U \varSigma V^T$ (singular values) the same as $X \Lambda X^{- 1}$ (eigenvalues)?
$A$ must be a positive semidefinite (or definite) symmetric matrix.

Proof of the SVD

$$
A^{\mathrm{T}} A=\left(U \Sigma V^{\mathrm{T}}\right)^{\mathrm{T}}\left(U \Sigma V^{\mathrm{T}}\right)=V \Sigma^{\mathrm{T}} U^{\mathrm{T}} U \Sigma V^{\mathrm{T}}=V \Sigma^{\mathbf{T}} \Sigma V^{\mathbf{T}}
$$

因此 $\left(\Sigma^{\mathrm{T}} \Sigma\right)$ must be the eigenvalue matrix of $\left(A^{\mathrm{T}} A\right)$ : Each $ \sigma^{2} $ is $ \lambda\left(A^{\mathrm{T}} A\right)$​.

$$ \begin{array}{cc} \text { Key step } \\\\ i \neq j & \boldsymbol{u}_{j}^T \boldsymbol{u}_{j}=\left(\frac{A \boldsymbol{v}_{i}}{\sigma_{i}}\right)^{\mathrm{T}}\left(\frac{A \boldsymbol{v}_{j}}{\sigma_{j}}\right)=\frac{\boldsymbol{v}_{i}^{\mathrm{T}} A^{\mathrm{T}} A \boldsymbol{v}_{j}}{\sigma_{i} \sigma_{j}}=\frac{\sigma_{j}^{2}}{\sigma_{i} \sigma_{j}} \boldsymbol{v}_{i}^{\mathrm{T}} \boldsymbol{v}_{j}=\text { zero } \end{array} $$

证明了 $\boldsymbol{u}_i$ 的正交性($V$ 是正交矩阵).

Singular Value Stability versus Eigenvalue Instability

对 $A=\left[\begin{array}{llll}
0 & 1 & 0 & 0 \\
0 & 0 & 2 & 0 \\
0 & 0 & 0 & 3 \\
0 & 0 & 0 & 0
\end{array}\right]$ 做小小的修改: $A=\left[\begin{array}{cccc}
0 & 1 & 0 & 0 \\
0 & 0 & 2 & 0 \\
0 & 0 & 0 & 3 \\
\frac{1}{60,000} & 0 & 0 & 0
\end{array}\right]$

That change by only $1 / 60,000$ produces a much bigger jump in the eigenvalues of $A$ $\lambda=0,0,0,0$ to $ \lambda=\frac{1}{10}, \frac{i}{10}, \frac{-1}{10}, \frac{-i}{10}$

可以看到哪怕 $A$ 改变很小对本征值的影响都很大, 从 0 直接变成了一个圆. 然而对于奇异值只是增加了一个 $\sigma_4=1/60000$, 非常小.

Singular Vectors of $A$ and Eigenvectors of $S = A^T A$

从下面可以看到对对称矩阵进行的本征分解与一般矩阵的奇异值分解有异曲同工之妙:

Symmetric $S \quad S=Q \Lambda Q^{\mathrm{T}}=\lambda_{1} \boldsymbol{q}_{1} \boldsymbol{q}_{1}^{\mathrm{T}}+\lambda_{2} \boldsymbol{q}_{2} \boldsymbol{q}_{2}^{\mathrm{T}}+\cdots+\lambda_{r} \boldsymbol{q}_{r} \boldsymbol{q}_{r}^{\mathrm{T}}$
Any matrix $A$ $A=U \Sigma V^{\mathrm{T}}=\sigma_{1} \boldsymbol{\boldsymbol{u}}_{1} \boldsymbol{\boldsymbol{v}}_{1}^{\mathrm{T}}+\sigma_{2} \boldsymbol{\boldsymbol{u}}_{2} \boldsymbol{\boldsymbol{v}}_{2}^{\mathrm{T}}+\cdots+\sigma_{r} \boldsymbol{\boldsymbol{u}}_{r} \boldsymbol{\boldsymbol{v}}_{r}^{\mathrm{T}}$

$\lambda_i$ 是 $S$ 的本征值, $\sigma_i^2$ 是 $A$ 的本征值.

接下来说明对本征值排序的原因(排序后值的应用):

$\lambda_{1}=$ maximum ratio $\frac{\boldsymbol{x}^{T} S \boldsymbol{x}}{\boldsymbol{x}^{\mathrm{T}} \boldsymbol{x}} .$ The winning vector is $\boldsymbol{x} = \boldsymbol{q}_{1}$ with $S \boldsymbol{q}_{1}=\lambda_{1} \boldsymbol{q}_{1}$ Compare with the largest singular value $\sigma_{1}$ of $A$. It solves this problem: $\sigma_{1}=$ maximum ratio $\frac{\|A \boldsymbol{x}\|}{\|\boldsymbol{x}\|} .$ The winning vector is $\boldsymbol{x}=\boldsymbol{v}_{1}$ with $A \boldsymbol{v}_{1}=\sigma_{1} \boldsymbol{\boldsymbol{u}}_{1}$.

推导:
Rayleigh quotient $r(\boldsymbol{x})$

The derivatives of $r(\boldsymbol{x})=\frac{\boldsymbol{x}^{\mathrm{T}} S \boldsymbol{x}}{\boldsymbol{x}^{\mathrm{T}}\boldsymbol{x}}$ are zero when $S \boldsymbol{x}=r(\boldsymbol{x}) \boldsymbol{x}$.

So the winning $\boldsymbol{x}$ is an eigenvector of $S$. The maximum ratio $r (\boldsymbol{x})$ is the largest eigenvalue $\lambda_1$ of $S$.

Maximizing $\frac{|A \boldsymbol{x}|}{|\boldsymbol{x}|}$ also maximizes $\left(\frac{|A \boldsymbol{x}|}{|\boldsymbol{x}|}\right)^{2}=\frac{\boldsymbol{x}^{\mathrm{T}} A^{\mathrm{T}} A \boldsymbol{x}}{\boldsymbol{x}^{\mathrm{T}} \boldsymbol{x}}=\frac{\boldsymbol{x}^{\mathrm{T}} S \boldsymbol{x}}{\boldsymbol{x}^{\mathrm{T}} \boldsymbol{x}}$.

So the winning $\boldsymbol{x} = \boldsymbol{v_1}$ is the same as the top eigenvector $q_1$ of $S = A^T A$​.

$\lambda_{2}=$ maximum ratio $\frac{\boldsymbol{x}^{T} S \boldsymbol{x}}{\boldsymbol{x}^{\mathrm{T}} \boldsymbol{x}}$ among all $\boldsymbol{x}$ 's with $\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{x}=0 . \quad \boldsymbol{x}=\boldsymbol{q}_{2}$ will win. $\sigma_{2}=$ maximum ratio $\frac{\|A \boldsymbol{x}\|}{\|\boldsymbol{x}\|}$ among all $\boldsymbol{x}$ 's with $\boldsymbol{v}_{1}^{\mathrm{T}} \boldsymbol{x}=0 . \quad \boldsymbol{x}=\boldsymbol{v}_{2}$ will win.

推导过程略过.

Computing the Eigenvalues of $S$ and Singular Values of $A$

低效率做法: 计算 $S=A^TA$, 然后计算其本征值 $\lambda_i$, 最后开方得到 $\sigma_i$.

提升效率的思路: 在不改变 eigenvalue 与 singular value 的前提下对 $A$ 与 $S$ 打洞洞(化成更多 0 的形式).

$$
\left(Q_{1}^{\mathrm{T}} A Q_{2}\right)^{\mathrm{T}}\left(Q_{1}^{\mathrm{T}} A Q_{2}\right)=Q_{2}^{\mathrm{T}} A^{\mathrm{T}} A Q_{2}=Q_{2}^{\mathrm{T}} S Q_{2} \text { gives the same } \sigma(A) \text { and } \lambda(S)
$$
$Q_1^T AQ_2$ = bidiagonal matrix.
$(bidiagonal)^T (bidiagonal) = tridiagonal$.

下一步不变还是通过 $\text{det}(S - AI) = 0$, 求解很复杂, 最好是借助 LAPACK 求得接近解.

7.3 Principal Component Analysis (PCA by the SVD)

本节概要

  1. Data often comes in a matrix: $n$ samples and $m$ measurements per sample.
  2. Center each row of the matrix $A$ by subtracting the mean from each measurement.
  3. The SVD finds combinations of the data that contain the most information.
  4. Largest singular value $\sigma_1$ $\leftrightarrow$ greatest variance $\leftrightarrow$ most information in $\boldsymbol{u_1}$.

本节主要是介绍 SVD 在统计学以及数据分析上的应用.

The data matrix $A_0$ has $n$ columns and $m$ rows. Graphically, the columns of $A_0$ are $n$ points in $R^m$. 我们关心的是如何把点集降维描述, 例如 2 维的点集描述为一条线. 例如下图(Data points in $A$ are often close to a line in $R^2$ or a subspace in $R^m$.):

$A$ is $2 \times n$ (large nullspace)
$A A^{\mathrm{T}}$ is $2 \times 2$ (small matrix)
$A^{\mathrm{T}} A$ is $n \times n$ (large matrix)
Two singular values $\sigma_{1}>\sigma_{2}>0$

center the data: 求出每行的平均值 $\mu_i$ 然后每行的元素均减去 $\mu_i$, 得到 centered $A$.

sample covariance matrix: $S=\frac{AA^T}{n-1}$.

The SVD of $A$ (centered data) shows the dominant direction in the scatter plot.

The leading eigenvector $\boldsymbol{u_1}$ shows the direction(主要方向). The second singular vector $\boldsymbol{u_2}$ is perpendicular to $\boldsymbol{u_1}$. The second singular value measures the spread across the dominant line(发散程度).

The Essentials of Principal Component Analysis (PCA)

PCA 提供了理解复杂数据的方法. 当输入很多组数据(每组数据又包含很多维度的信息), 我们需要搞清楚哪个维度的信息是最重要的, 或者前 $R$ 个信息最重要的.

这时候我们对数据矩阵进行 center 化, 求解 sample covariance matrix , 然后求其 eigenvalues 以及 eigenvectors 求其前 $R$ 个最重要的满足我们的需求.

书中举了很多例子:

  1. 回归分析 Perpendicular Least Squares
  2. 不同维度度量不同,因此使用 Correlation 代替 covariance. The Sample Correlation Matrix
  3. 分析欧洲的基因配对发现基因分布的空间性. Genetic Variation in Europe
  4. 面部识别算法. Eigenfaces
  5. 自动建模, 对于非常复杂的时变系统, 我们很难直接建模, 可以先通过采数据对于一帧帧的 snapshots 进行主次要因的分解从而加快建立模型的速度. Model Order Reduction
  6. Google 的网页搜索算法是通过 link into/out 以及重要性系数等方法实现对网页的排序的. Searching the Web

7.4 The Geometry of the SVD

本节概要

  1. A typical square matrix $A= U \varSigma V^T$ factors into (rotation) (stretching) (rotation).
  2. The geometry shows how $A$ transforms vectors $\boldsymbol{x}$ on a circle to vectors $A \boldsymbol{x}$ on an ellipse.
  3. The norm of $A$ is $||A|| = \sigma_1$. This singular value is its maximum growth factor $||A \boldsymbol{x}|| / ||\boldsymbol{x}||$.
  4. Polar decomposition factors $A$ into $QS$: rotation $Q = UV^T$ times stretching $S= V \varSigma V^T$.
  5. The pseudoinverse $A^+ = V \varSigma^+ U^T$ brings $A \boldsymbol{x}$ in the column space back to $\boldsymbol{x}$ in the row space.

SVD 分解的三个矩阵分别为 (orthogonal)$ \boldsymbol{x}$ (diagonal) $\boldsymbol{x}$ (orthogonal) 对应到几何世界里为 (rotation) $\boldsymbol{x}$ (stretching) $\boldsymbol{x}$ (rotation).

This picture will guide us to three neat ideas in the algebra of matrices:

  1. The norm $||A||$ of a matrix-its maximum growth factor.
  2. The polar decomposition $A = QS$-orthogonal $Q$ times positive definite $S$.
  3. The pseudoinverse $A^+$ -the best inverse when the matrix $A$ is not invertible.

The Norm of a Matrix

The norm $||A||$ is the largest ratio $\frac{\|A \boldsymbol{x}\|}{\|\boldsymbol{x}\|} \quad \|A\|=\max _{\boldsymbol{x} \neq 0} \frac{\|A \boldsymbol{x}\|}{\|\boldsymbol{x}\|}=\sigma_{1}$

MATLAB 里的 norm() 函数对向量:
$\text{ norm}(\boldsymbol{x}) = ||x_1||^2 + ||x_2||^2 · · · + ||x_n||^2$.
对矩阵:
$\text{ norm}(A) = ||A|| = ||A \boldsymbol{x}||/ ||\boldsymbol{x}|| = \sigma_1$.

关于范数的 2 个不等式
Triangle inequality: $||A + B|| \leq ||A|| + ||B||$.
Product inequality:$||AB|| \leq ||A|| ||B||$.

这可以由向量的范数的不等式以及矩阵范数的定义推导而来:
For vectors:

$||(A + B)\boldsymbol{x}|| \leq ||A \boldsymbol{x}|| + || B \boldsymbol{x} || \leq ||A|| : ||\boldsymbol{x}|| + ||B|| : ||\boldsymbol{x}||$.

对于所有的 $\boldsymbol{x}$ 都可以除以 $||\boldsymbol{x}||$, 可得: $||A + B|| \leq ||A|| + ||B||$.

对于第二个不等式: $|| AB \boldsymbol{x} || \leq ||A|| : ||Bx|| \leq ||A|| : ||B|| : ||\boldsymbol{x}||$.­ 对于所有的 $\boldsymbol{x}$ 都可以除以 $||\boldsymbol{x}||$, 可得: $||AB|| \leq ||A|| ||B||$.

对于 Schwarz inequality $|\boldsymbol{v}^T \boldsymbol{u}| \leq ||\boldsymbol{u}|| : ||\boldsymbol{v}||$的理解:
rank-one matrix $A = \boldsymbol{u} \boldsymbol{v}^T$:

Eigenvector $A \boldsymbol{\boldsymbol{u}}=\left(\boldsymbol{\boldsymbol{u}} \boldsymbol{\boldsymbol{v}}^{\mathrm{T}}\right) \boldsymbol{\boldsymbol{u}}=\boldsymbol{\boldsymbol{u}}\left(\boldsymbol{\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{\boldsymbol{u}}\right)=\lambda_{1} \boldsymbol{\boldsymbol{u}}$ So $\lambda_{1}=\boldsymbol{\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{\boldsymbol{u}}$.
Singular vector $A^{\mathrm{T}} A \boldsymbol{\boldsymbol{v}}=\left(\boldsymbol{\boldsymbol{v}} \boldsymbol{\boldsymbol{u}}^{\mathrm{T}}\right)\left(\boldsymbol{\boldsymbol{u}} \boldsymbol{\boldsymbol{v}}^{\mathrm{T}}\right) \boldsymbol{\boldsymbol{v}}=\boldsymbol{\boldsymbol{v}}\left(\boldsymbol{\boldsymbol{u}}^{\mathrm{T}} \boldsymbol{\boldsymbol{u}}\right)\left(\boldsymbol{\boldsymbol{v}}^{\mathrm{T}} \boldsymbol{\boldsymbol{v}}\right)=\sigma_{1}^{2} \boldsymbol{\boldsymbol{v}}$ So $\sigma_{1}=|\boldsymbol{\boldsymbol{u}}||\boldsymbol{\boldsymbol{v}}|$.
$||A \boldsymbol{x}||/ ||\boldsymbol{x}|| = ||\lambda_1 \boldsymbol{x}||/ ||\boldsymbol{x}|| = |\lambda_1| \leq \sigma_1$, 从而证明了 Schwarz inequality.

但是对于第二个则相反: $|\lambda_2| \geq \sigma_2$. 因为 $|\text{det} A| = |\lambda_1 \lambda_2 | = \sigma_1 \sigma_2$.

$$ \text { The closest rank } \boldsymbol{k} \text { matrix to } \boldsymbol{A} \text { is } \boldsymbol{A}_{\boldsymbol{k}}=\sigma_{1} \boldsymbol{\boldsymbol{u}}_{1} \boldsymbol{\boldsymbol{v}}_{1}^{\mathrm{T}}+\cdots+\sigma_{k} \boldsymbol{\boldsymbol{u}}_{k} \boldsymbol{\boldsymbol{v}}_{k}^{\mathrm{T}} $$

这个式子的意义在于所有的 $\boldsymbol{u}$’s,$\boldsymbol{v}$’s 给出了正交的四个基本 subspace 的基, 然后前 $k$ 个 $\boldsymbol{u}$’s, $\boldsymbol{v}$’s 给出了对 $A$ 的最好的估计.

一个引申的定理: Eckart-Young-Mirsky Theorem
$$
|A-B| \geq\left|A-A_{k}\right|=\sigma_{k+1} \text { for all matrices } B \text { of rank } k .
$$

Polar Decomposition $A = Q S$

使用一个复数的分解开始理解 Polar Decomposition.
$x+yi = r cos \theta + i r sin \theta = r(cos \theta+ i sin \theta) = r e^{i \theta}$
把此复数看作 $1 \times 1$ 的矩阵,则 $e^{i \theta}$ 是 orthogonal matrix 正交矩阵 $Q$, $r \geq 0$ 是 positive semidefinite matrix $S$.
拓展到 $n \times n$ 矩阵得到 Polar Decomposition.

定义:
Every real square matrix can be factored into $A = Q S$, where $Q$ is orthogonal and $S$ is symmetric positive semidefinite. If $A$ is invertible, $S$ is positive definite.

$A=U \varSigma V^T = (UV^T)(V \varSigma V^T) = (Q) (S)$

$Q,S$ 分别代表旋转与延伸.

$A$ 为 invertible 时, $S$ 为 positive definite 的原因:
$S^2 = V \varSigma^2 V^T = A^TA$. 因此 $S$ 是 $A^TA$ 的平方根, 为 positive definite.

也可以分解成另外一种形式:
$A=KQ$, $K=U^T \varSigma U$.

Polar Decomposition 的几何意义:
The eigenvalues of $S$ give the stretching factors. The eigenvectorsof $S$ give the stretching directions (the principal axes of the ellipse). The orthogonal matrix $Q$ includes both rotations $U$ and $V^T$.

$Q = UV^T$ is the nearest orthogonal matrix to $A$. This $Q$ makes the norm $||Q - A||$ as small as possible.

因此得到:
The nearest singular matrix $A_0$ to $A$ comes by changing the smallest $\sigma_{min}$ to zero.

应用:
In computational practice we often do knock out a very small $\sigma$. Working with singular matrices is better than coming too close to zero and not noticing.

The Pseudoinverse $A^+$

求解如下:

$$ \begin{array}{c} \text { Pseudoinverse of } A \\\\ A^{+}=V \Sigma^{+} U^{\mathbf{T}} \end{array}=\left[\begin{array}{c} \boldsymbol{\boldsymbol{v}}_{1} \cdots \boldsymbol{\boldsymbol{v}}_{r} \cdots \boldsymbol{\boldsymbol{v}}_{n} \\\\ n \text { by } n \end{array}\right]\left[\begin{array}{ccc} \sigma_{1}^{-1} & & \\\\ & \ddots & \\\\ & & \sigma_{r}^{-1} \\\\ & n \text { by } m & \end{array}\right]\left[\begin{array}{c} \boldsymbol{\boldsymbol{u}}_{1} \cdots \boldsymbol{\boldsymbol{u}}_{r} \cdots \boldsymbol{\boldsymbol{u}}_{m} \\\\ m \text { by } m \end{array}\right]^{\mathrm{T}} $$

The pseudoinverse $A^+$ is an $n \times m$ matrix. If $A^{- 1}$ exists (we said it again), then $A^+$ is the same as $A^{- 1}$.

$$ A^{+} \boldsymbol{\boldsymbol{u}}_{i}=\frac{1}{\sigma_{i}} \boldsymbol{\boldsymbol{v}}_{i} \quad \text { for } i \leq r \quad \text { and } \quad A^{+} \boldsymbol{\boldsymbol{u}}_{i}=\mathbf{0} \quad \text { for } i>r $$

The product $\varSigma^+ \varSigma$ is as near to the identity as we can get. It is a projection matrix.

$$
\varSigma^+ \varSigma = \left[
\begin{array}{}
I & 0 \\
0 & 0
\end{array}{} \right]
$$

类比与 $AA^{- 1} =A^{- 1} A=I$,

$A A^{+}=$projection matrix onto the column space of $A$
$A^{+} A=$ projection matrix onto the row space of $A$

应用: Least Squares with Dependent Columns

$$ A \boldsymbol{\boldsymbol{x}}=\left[\begin{array}{ll} 1 & 1 \\\\ 1 & 1 \end{array}\right]\left[\begin{array}{l} x_{1} \\\\ x_{2} \end{array}\right]=\left[\begin{array}{l} 3 \\\\ 1 \end{array}\right]=\boldsymbol{b} \quad A^{\mathrm{T}} A \widehat{\boldsymbol{\boldsymbol{x}}}=\left[\begin{array}{ll} 2 & 2 \\\\ 2 & 2 \end{array}\right]\left[\begin{array}{l} \widehat{\boldsymbol{x}}_{1} \\\\ \widehat{\boldsymbol{x}}_{2} \end{array}\right]=\left[\begin{array}{l} 4 \\\\ 4 \end{array}\right]=A^{\mathrm{T}} \boldsymbol{b} $$

Section 4.3 中所展示的例子, 上面的方程有无限多个解, The pseudoinverse gives us a way to choose a “best solution” $\boldsymbol{x}^+ =A^+ b$.
Any vector $\boldsymbol{x} = (1 + c, 1 - c)$ will solve those normal equations $A^T A \hat{\boldsymbol{x}}=A^T b$. The purpose of the seudoinverse is to choose one solution $\hat{\boldsymbol{x}} = \boldsymbol{x}^+$.

Introdunction to Linear Algebra-第7章-The Singular Value Decomposition (SVD)

https://www.chuxin911.com/Introdunction_to_Linear_Algebra_Chapter7_20220220/

作者

cx

发布于

2022-02-20

更新于

2022-12-02

许可协议