Introdunction to Linear Algebra-第4章-Orthogonality

Introdunction to Linear Algebra-第4章-Orthogonality

[TOC]
《Introdunction to Linear Algebra》的第4章: Orthogonality.

4.1 Orthogonality of all 4 subspaces

本节概要

  1. Orthogonal vectors have $\boldsymbol{v}^{\mathrm{T}} \boldsymbol{w}=0 .$ Then $|\boldsymbol{v}|^{2}+|\boldsymbol{w}|^{2}=|\boldsymbol{v}+\boldsymbol{w}|^{2}=|\boldsymbol{v}-\boldsymbol{w}|^{2}$
  2. Subspaces $\boldsymbol{V}$ and $\boldsymbol{W}$ are orthogonal when $\boldsymbol{v}^{\mathrm{T}} \boldsymbol{w}=0$ for every $\boldsymbol{v}$ in $\boldsymbol{V}$ and every $\boldsymbol{w}$ in $\boldsymbol{W}$.
  3. The row space of $A$ is orthogonal to the nullspace. The column space is orthogonal to $N\left(A^{\mathrm{T}}\right)$.
  4. One pair of dimensions adds to $r+(n-r)=n .$ The other pair has $r+(m-r)=m .$
  5. Row space and nullspace are orthogonal complements: Every $\boldsymbol{x}$ in $\mathbf{R}^{n}$ splits into $\boldsymbol{x}_{\mathbf{r o w}}+\boldsymbol{x}_{\mathbf{n u l l}}$.
  6. Suppose a space $\boldsymbol{S}$ has dimension $d .$ Then every basis for $\boldsymbol{S}$ consists of $d$ vectors.
  7. If $d$ vectors in $\boldsymbol{S}$ are independent, they span $\boldsymbol{S} .$ If $d$ vectors span $\boldsymbol{S}$, they are independent.

A matrix multiplies a vector: $A$ times $x$.
At the first level this is only numbers.
At the second level $A \boldsymbol{x}$ is a combination of column vectors.
The third level shows subspaces.

the row space and nullspace are orthogonal subspaces inside $\textbf{R}^n$.

Orthogonal subspaces
$\boldsymbol{v}^T \boldsymbol{w} = 0$ for all $\boldsymbol{v}$ in $V$ and all $\boldsymbol{w}$ in $W$.

例子:

Every vector $\boldsymbol{x}$ in the nullspace is perpendicular to every row of $A$, because $A \boldsymbol{x} = 0$.
The nullspace $N(A)$ and the row space $C(A^T )$ are orthogonal subspaces of $\textbf{R}^n$.
$$
\text { Nullspace orthogonal to row space } \quad \boldsymbol{x}^{\mathrm{T}}\left(A^{\mathrm{T}} \boldsymbol{y}\right)=(A \boldsymbol{x})^{\mathrm{T}} \boldsymbol{y}=0^{\mathrm{T}} \boldsymbol{y}=0 \text {. }
$$

Every vector $y$ in the nullspace of $A^T$ is perpendicular to every column of $A$.
The left nullspace $N(A^T )$ and the column space $C(A)$ are orthogonal in $\textbf{R}^m$.
证明:

$$
C(A) \perp N\left(A^{\mathrm{T}}\right) \quad A^{\mathrm{T}} \boldsymbol{y}=\left[\begin{array}{c}
(\text { column } \mathbf{1})^{\mathrm{T}} \\
\cdots \\
(\text { column } \boldsymbol{n})^{\mathrm{T}}
\end{array}\right]\boldsymbol{y}=\left[\begin{array}{l}
0 \\
\cdot \\
0
\end{array}\right]
$$

Big Picture-two subspaces in $\textbf{R}^n$ and two subspaces in $\textbf{R}^m$:

引入正交补的意义:

The fundamental subspaces are more than just orthogonal (in pairs).Their dimensions are also right. Two lines could be perpendicular in $\textbf{R}^3$, but those lines could not be the row space and nullspace of a 3 by 3 matrix. The lines have dimensions 1 and 1, adding to 2. But the correct dimensions rand $n - r$ must add to $n= 3$.

The orthogonal complement of a subspace $V$​ contains every vector that is perpendicular to $V$​. This orthogonal subspace is denoted by $V^{\bot}$​(pronounced “V perp”).
By this definition, the nullspace is the orthogonal complement of the row space.

Fundamental Theorem of Linear Algebra, Part 2
$N(A)$​ is the orthogonal complement of the row space $C\left(A^{\mathrm{T}}\right)\left(\right.$​ in $\left.\mathbf{R}^{n}\right) .$​
$N\left(A^{\mathrm{T}}\right)$​ is the orthogonal complement of the column space $C(A)\left(\right.$​ in $\left.\mathbf{R}^{m}\right) .$​

The point of “complements” is that every $\boldsymbol{x}$ can be split into a row space component $\boldsymbol{x}_r$ and a nullspace component $\boldsymbol{x}_n$. When $A$ multiplies $\boldsymbol{x} = \boldsymbol{x}_r + \boldsymbol{x}_n$, Figure below shows what happens to $A\boldsymbol{x}= A\boldsymbol{x}_r + A\boldsymbol{x}_n$:
The nullspace component goes to zero: $A\boldsymbol{x}_n = 0$.
The row space component goes to the column space: $A\boldsymbol{x}_r
= A\boldsymbol{x}$.

There is an $r \times r$ (rank $r$)invertible matrix hiding inside $A$, if we throw away the two nullspaces. From the row space to the column space, $A$ is invertible –”pseudoinverse”.

Singular Value Decomposition
Every matrix can be diagonalized, when we choose the right bases for $R^n$ and $R^m$.

If a vector $\boldsymbol{v}$ is orthogonal to itself then $\boldsymbol{v}$ is the zero vector.

将 row space 与 nullspace 正交的现象图像化:

Row space of $A$ = plane. Nullspace = orthogonal line. Dimensions $2 + 1 = 3$.

从向量到矩阵的概念延伸

Any $n$ independent vectors in $\textbf{R}^n$ must span $\textbf{R}^n$. So they are a basis.
Any $n$ vectors that span $\textbf{R}^n$ must be independent. So they are a basis.

If the $n$ columns of $A$ are independent, they span $\textbf{R}^n$. So $A \boldsymbol{x} = \boldsymbol{b}$ is solvable.
If the $n$ columns span $\textbf{R}^n$ , they are independent. So $A \boldsymbol{x} = \boldsymbol{b}$ has only one solution.
$A$ is invertible

4.2 Projections

本节概要

  1. The projection of a vector $\boldsymbol{b}$ onto the line through $\boldsymbol{a}$ is the closest point $\boldsymbol{p}=\boldsymbol{a}\left(\boldsymbol{a}^{\mathrm{T}} \boldsymbol{b} / \boldsymbol{a}^{\mathrm{T}} \boldsymbol{a}\right)$.
  2. The error $\boldsymbol{e}=\boldsymbol{b}-\boldsymbol{p}$ is perpendicular to $\boldsymbol{a}$ : Right triangle $\boldsymbol{b} \boldsymbol{p} \boldsymbol{e}$ has $|\boldsymbol{p}|^{2}+|\boldsymbol{e}|^{2}=|\boldsymbol{b}|^{2} .$
  3. The projection of $\boldsymbol{b}$ onto a subspace $S$ is the closest vector $\boldsymbol{p}$ in $S ; \boldsymbol{b}-\boldsymbol{p}$ is orthogonal to $S$.
  4. $A^{\mathrm{T}} A$ is invertible (and symmetric) only if $A$ has independent columns: $\boldsymbol{N}\left(A^{\mathrm{T}} A\right)=\boldsymbol{N}(A) .$
  5. Then the projection of $\boldsymbol{b}$ onto the column space of $A$ is the vector $\boldsymbol{p}=A\left(A^{\mathrm{T}} A\right)^{-1} A^{\mathrm{T}} \boldsymbol{b}$.
  6. The projection matrix onto $\boldsymbol{C}(A)$ is $P=A\left(A^{\mathrm{T}} A\right)^{-1} A^{\mathrm{T}} .$ It has $\boldsymbol{p}=P \boldsymbol{b}$ and $P^{2}=P=P^{\mathrm{T}}$.

可以看到 $\boldsymbol{b}$ 在直线$[0 : 0 : z]^T$以及平面$[x : y : 0]$上的投影分别为 $P_1 \boldsymbol{b}$ 与 $P_2 \boldsymbol{b}$.

由于直线与平面互为正交补,可得:
The vectors give $P_1 + P_2 = \boldsymbol{b}$.The matrices give $P_1+ P_2
= I$.

Our problem is to project any $\boldsymbol{b}$ onto the column space of any $m \times n$ matrix.

Projection Onto a Line 投影到线上

投影计算的图解:

Projecting $\boldsymbol{b}$ onto $\boldsymbol{a}$ with error $\boldsymbol{e} = \boldsymbol{b} -\widehat{\boldsymbol{x}} \boldsymbol{a}$
$\boldsymbol{a} \cdot(\boldsymbol{b}-\widehat{\boldsymbol{x}} \boldsymbol{a})=0 \quad \text { or } \quad \boldsymbol{a} \cdot \boldsymbol{b}-\widehat{\boldsymbol{x}} \boldsymbol{a} \cdot \boldsymbol{a}=0$
$$
\widehat{x}=\frac{\boldsymbol{a} \cdot \boldsymbol{b}}{\boldsymbol{a} \cdot \boldsymbol{a}}=\frac{\boldsymbol{a} ^{\mathrm{T}} \boldsymbol{b}}{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{a}}
$$
投影长度:
$$
\boldsymbol{p}=\frac{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{b}}{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{a}} \boldsymbol{a} \quad \text { has length } \quad|\boldsymbol{p}|=\frac{|\boldsymbol{a}||\boldsymbol{b}| \cos \theta}{|\boldsymbol{a}|^{2}}|\boldsymbol{a}|=|\boldsymbol{b}| \cos \theta
$$

投影矩阵 Projection Matrix $P$

$\boldsymbol{p}=\boldsymbol{a} \widehat{\boldsymbol{x}}=\boldsymbol{a} \frac{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{b}}{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{a}}=P \boldsymbol{b} \quad$ when the matrix is $\quad P=\frac{\boldsymbol{a} \boldsymbol{a}^{\mathrm{T}}}{\boldsymbol{a}^{\mathrm{T}} \boldsymbol{a}}$
$P$ is a column times a row! The projection matrix $P$ is $m \times m$, but its rank is one.
有了投影矩阵对于任何 $m \times 1$的向量 $\boldsymbol{b}$ 我们都能计算出其投影.

Projecting a second time doesn’t change anything!
$P^2=P$.

$(I - P) \boldsymbol{b}$ equals $\boldsymbol{b}- \boldsymbol{p}$ which is $\boldsymbol{e}$ in the left nullspace.

Projection Onto a Subspace

问题描述:

Start with $n$ vectors $\boldsymbol{a}_1, … , \boldsymbol{a}_n$ in $R^m$. Assume that these $\boldsymbol{a}$’s are linearly independent.

Problem: Find the combination $\boldsymbol{p} = \hat{\boldsymbol{x}}_1 \boldsymbol{a}_1 + · · · + \hat{\boldsymbol{x}}_n \boldsymbol{a}_n$ closest to a given vector $\boldsymbol{b}$. $$ \begin{gathered} \boldsymbol{a}_{1}^{\mathrm{T}}(\boldsymbol{b}-A \widehat{\boldsymbol{x}})=0 \\\\ \vdots \\\\ \boldsymbol{a}_{n}^{\mathrm{T}}(\boldsymbol{b}-A \widehat{\boldsymbol{x}})=0 \end{gathered} \quad \text { or } \quad\left[\begin{array}{c} -\boldsymbol{a}_{1}^{\mathrm{T}}- \\\\ \vdots \\\\ -\boldsymbol{a}_{n}^{\mathrm{T}}- \end{array}\right]\left[\begin{array}{c} \boldsymbol{b}-A \widehat{\boldsymbol{x}} \\\\ \end{array}\right]=\left[\begin{array}{c} \mathbf{0} \\\\ \end{array}\right] $$

左侧成立是因为 $\boldsymbol{b} -A\hat{\boldsymbol{x}}$ 为 error 与 $\boldsymbol{a}_i$ 均垂直.

可以得到如下的投影计算步骤:

1. The combination $\boldsymbol{p}=\widehat{\boldsymbol{x}}_{1} \boldsymbol{a}_{1}+\cdots+\widehat{\boldsymbol{x}}_{n} \boldsymbol{a}_{n}=A \widehat{\boldsymbol{x}}$ that is closest to $\boldsymbol{b}$ comes from $\widehat{\boldsymbol{x}}$ :

Find $\widehat{\boldsymbol{x}}(n \times 1)$

$$
A^{\mathrm{T}}(\boldsymbol{b}-A \widehat{\boldsymbol{x}})=\mathbf{0} \quad \text{or} \quad A^{\mathrm{T}} A \widehat{\boldsymbol{x}}=A^{\mathrm{T}} \boldsymbol{b}.
$$

  1. This symmetric matrix $A^{\mathrm{T}} A$ is $n$ by $n$. It is invertible if the $\boldsymbol{a}$ ‘s are independent. The solution is $\widehat{\boldsymbol{x}}=\left(A^{\mathrm{T}} A\right)^{-1} A^{\mathrm{T}} \boldsymbol{b}$. The projection of $\boldsymbol{b}$ onto the subspace is $\boldsymbol{p}$ :

Find $\boldsymbol{p}(m \times 1)$
$$
\boldsymbol{p}=A \widehat{\boldsymbol{x}}=A\left(A^{\mathrm{T}} A\right)^{-1} A^{\mathrm{T}} \boldsymbol{b} .
$$
3. The next formula picks out the projection matrix that is multiplying $\boldsymbol{b}$​ in above:

Find $P(m \times m)$
$$
P=A\left(A^{\mathrm{T}} A\right)^{-1} A^{\mathrm{T}} .
$$
$A^T (\boldsymbol{b} - A \boldsymbol{x}) = 0$ 被称为线性代数中的 normal equation, 优美之处在于:

  1. Our subspace is the column space of $A$.
  2. The error vector $\boldsymbol{b}-A \widehat{\boldsymbol{x}}$ is perpendicular to that column space.
  3. Therefore $\boldsymbol{b}-A \widehat{\boldsymbol{x}}$ is in the nullspace of $A^{\mathrm{T}}$ ! This means $A^{\mathrm{T}}(\boldsymbol{b}-A \widehat{\boldsymbol{x}})=\mathbf{0}$.

当 $A$ 不为方阵时如何?
$A^T A$ is invertible if and only if $A$ has linearly independent columns.

证明:
$A^T A$ has the same nullspace as $A$.
$$
\left(\boldsymbol{x}^{\mathrm{T}}\right) A^{\mathrm{T}} A \boldsymbol{x}=0 \quad \text { or } \quad(A \boldsymbol{x})^{\mathrm{T}}(A \boldsymbol{x})=0 \quad \text { or } \quad|A \boldsymbol{x}|^{2}=0
$$

When $A$ has independent columns, $A^T A$ is square, symmetric, and invertible.

4.3 Least Squares Approximations

本节概要

  1. Solving $A^{\mathrm{T}} A \widehat{\boldsymbol{x}}=A^{\mathrm{T}} \boldsymbol{b}$ gives the projection $\boldsymbol{p}=A \widehat{\boldsymbol{x}}$ of $\boldsymbol{b}$ onto the column space of $A$.
  2. When $A \boldsymbol{x}=\boldsymbol{b}$ has no solution, $\widehat{\boldsymbol{x}}$ is the “least-squares solution” : $|\boldsymbol{b}-A \widehat{\boldsymbol{x}}|^{2}=\operatorname{minimum}$
  3. Setting partial derivatives of $E=|A \boldsymbol{x}-\boldsymbol{b}|^{2}$ to zero $\left(\frac{\partial E}{\partial x_{i}}=0\right)$ also produces $A^{\mathrm{T}} A \widehat{\boldsymbol{x}}=A^{\mathrm{T}} \boldsymbol{b} .$
  4. To fit points $\left(t_{1}, b_{1}\right), \ldots,\left(t_{m}, b_{m}\right)$ by a straight line, $A$ has columns $(1, \ldots, 1)$ and $\left(t_{1}, \ldots, t_{m}\right)$
  5. In that case $A^{\mathrm{T}} A$ is the 2 by 2 matrix $\left[\begin{array}{cc}m & \Sigma t_{i} \\ \Sigma t_{i} & \Sigma t_{i}^{2}\end{array}\right]$ and $A^{\mathrm{T}} \boldsymbol{b}$ is the vector $\left[\begin{array}{l}\boldsymbol{\Sigma} b_{i} \\ \boldsymbol{\Sigma} t_{i} b_{i}\end{array}\right]$

问题的引出:
很多时候求解 $A \boldsymbol{x}= \boldsymbol{b}$, 无解是因为 $A$ 的 $m>n$,即方程的数目大于变量的数目, 出现了过约束的问题. 那如何找到一个解 $\hat{\boldsymbol{x}}$ 能最大程度上地逼近所有约束呢? 简单地来说: 求解 normal equation:
When $A \boldsymbol{x}= \boldsymbol{b}$ has no solution, multiply by $A^T$ and solve $A^T A\hat{\boldsymbol{x}}=A^T \boldsymbol{b}$.

Minimizing the Error

思路是找到 $\hat{\boldsymbol{x}}$ 下的 error, $ \boldsymbol{e} = \boldsymbol{b} - A \boldsymbol{x}$,最小化 $\boldsymbol{e}$ 就可以得到 $\hat{\boldsymbol{x}}$了. 令 $\boldsymbol{p} = A \hat{\boldsymbol{x}}$. 有三种方式解决此问题.

例子描述如下:
Find the closest line to the points (0, 6), (1, 0), and (2, 0).

$$
\begin{array}{lll}
t=0 & \text { The first point is on the line } b=C+D t \text { if } & C+D \cdot 0=6 \\
t=1 & \text { The second point is on the line } b=C+D t \text { if } & C+D \cdot 1=0 \\
t=2 & \text { The third point is on the line } b=C+D t \text { if } & C+D \cdot 2=0
\end{array}
$$

$$
A \boldsymbol{x} = \boldsymbol{b}\\
\begin{bmatrix}
1 & 0 \\
1 & 1 \\
1 & 2
\end{bmatrix}
x =
\begin{bmatrix}
6 \\
0 \\
0
\end{bmatrix}
$$

几何法推导最小二乘

$A \boldsymbol{x}$ 计算的结果肯定位于 $A$ 的列向量 $[1 :1 :1],[0 :1 :2]$ 张成的空间(在此例为平面)内, 在张成的平面里, 我们要找的误差最小的点为到 $\boldsymbol{b}$ 的投影向量 $\boldsymbol{p}$, 这里的例子比较理想 $\boldsymbol{b}$ 与 $\boldsymbol{t}$ 正好正交(如果 $\boldsymbol{b}$ 与 $\boldsymbol{t}$ 不正交, 误差 $\boldsymbol{e}$ 依旧应该为 $\boldsymbol{b}$ 上的分量), 看起来如下图:

三个投影分量 $\boldsymbol{p}_1,\boldsymbol{p}_2,\boldsymbol{p}_3$ 必须位于一条直线上, 因为 $\boldsymbol{p}$ 位于 $A$ 的列向量空间中. 然后通过直线拟合算法(可以使用直线拟合)得到最优直线 $\boldsymbol{b}$.

代数法推导最小二乘

把 $\boldsymbol{b}$ 分解为可求解的部分 $\boldsymbol{p}$ 以及与 $\boldsymbol{p}$ 垂直的 $\boldsymbol{e}$
$A \boldsymbol{x} = \boldsymbol{b} = \boldsymbol{p} + \boldsymbol{e}$ 无解, $A\hat{\boldsymbol{x}}= \boldsymbol{p}$ 有解, $\hat{\boldsymbol{x}}=(AA^T)^{-1}A^T \boldsymbol{b}$
根据勾股定理有:
$||A \boldsymbol{x} - \boldsymbol{b}||^2 = ||A \boldsymbol{x} - \boldsymbol{p}||^2 + ||\boldsymbol{e}||^2$

当 $A \boldsymbol{x} - \boldsymbol{p}=0$, 其中 $\boldsymbol{x}= \hat{ \boldsymbol{x} } $, 就可以得到最小的 $\boldsymbol{e}$. 可以看到 $\boldsymbol{b}, \boldsymbol{e} ,\boldsymbol{p}$ 关系如下:

$\boldsymbol{b}$ 不在 $A$ 的列向量空间内. 所以 $A \boldsymbol{x}=\boldsymbol{b}$ 没有解.

并且可以得出 $ \boldsymbol{e}_1 + \boldsymbol{e}_2 + \boldsymbol{e}_3 = 0$, 因为 $ \boldsymbol{e} $ 与 $A$ 中的第一列垂直.

积分法推导最小二乘

$$
\min{E} = ||Ax - b||^2 = (C+D · 0 - 6)^2 +(C +D ·1)^2 +(C+D ·2)^2
$$
$$
\begin{aligned}
&\partial E / \partial C=2(C+D \cdot 0-6)+2(C+D \cdot 1)+2(C+D \cdot 2)=0 \\
&\partial E / \partial D=2(C+D \cdot 0-6)(\mathbf{0})+2(C+D \cdot 1)(\mathbf{1})+2(C+D \cdot 2)(\mathbf{2})=0
\end{aligned}
$$

The $C$ derivative is zero: $\quad 3 C+3 D=6 \quad$

The $D$ derivative is zero: $3 C+5 D=0$

This matrix $\left[\begin{array}{ll}3 & 3 \\ 3 & 5\end{array}\right]$ is $\boldsymbol{A}^{\mathrm{T}} \boldsymbol{A}$ .

正好与 normal equations 一致, 即 $A^T A \hat{ \boldsymbol{x} } = A^T \boldsymbol{b}$.
求解得: $C = 5, D = -3$.

The Big Picture for Least Squares

Fitting a Straight Line

$$ \begin{align} & C+D t_{1}=\boldsymbol{b}_{1} \\\\ A \boldsymbol{x}=\boldsymbol{b} \quad \text { is } \quad &C+D t_{2}=\boldsymbol{b}_{2} \quad \text { with } \quad A=\left[\begin{array}{cc} 1 & t_{1} \\\\ 1 & t_{2} \\\\ \vdots & \vdots \\\\ 1 & t_{m} \end{array}\right] \\\\ &\qquad \vdots \\\\ & C+D t_{m}=\boldsymbol{b}_{m} \end{align} $$ The closest line $C+ D \boldsymbol{t}$ has heights $\boldsymbol{p}_{1}, \ldots, \boldsymbol{p}_{m}$ with errors $\boldsymbol{e}_{1}, \ldots, \boldsymbol{e}_{m}$ Solve $A^{\mathrm{T}} A \widehat{\boldsymbol{x}} = A^{\mathrm{T}} \boldsymbol{b}$ for $\widehat{\boldsymbol{x}}=(C, D) .$ The errors are $\boldsymbol{e}_{i}=\boldsymbol{b}_{i}-C-D \boldsymbol{t}_{i} .$ $$ \textbf { Dot-product matrix } A^{\mathbf{T}} A=\left[\begin{array}{ccc} 1 & \cdots & 1 \\\\ t_{1} & \cdots & t_{m} \end{array}\right]\left[\begin{array}{cc} 1 & t_{1} \\\\ \vdots & \vdots \\\\ 1 & t_{m} \end{array}\right]=\left[\begin{array}{cc} m & \sum t_{i} \\\\ \sum t_{i} & \sum t_{i}^{2} \end{array}\right] $$

$$
A^{\mathrm{T}} \boldsymbol{b}=\left[\begin{array}{ccc}
1 & \cdots & 1 \\
t_{1} & \cdots & t_{m}
\end{array}\right]\left[\begin{array}{c}
b_{1} \\
\vdots \\
b_{m}
\end{array}\right]=\left[\begin{array}{c}
\sum b_{i} \\
\sum t_{i} b_{i}
\end{array}\right]
$$

The line $C+ D \boldsymbol{t}$ minimizes $\boldsymbol{e}_{1}^{2}+\cdots+\boldsymbol{e}_{m}^{2}=\|A \boldsymbol{x}-\boldsymbol{b}\|^{2}$ when $A^{\mathrm{T}} A \widehat{\boldsymbol{x}}=A^{\mathrm{T}} \boldsymbol{b} $

$$
A^{\mathrm{T}} A \widehat{\boldsymbol{x}}=A^{\mathrm{T}} \boldsymbol{b} \quad\left[\begin{array}{cc}
m & \sum t_{i} \\
\sum t_{i} & \sum t_{i}^{2}
\end{array}\right]\left[\begin{array}{l}
C \\
D
\end{array}\right]=\left[\begin{array}{c}
\sum b_{i} \\
\sum t_{i} b_{i}
\end{array}\right]
$$

可以拓展直线(2 维)为任意维度, 只要 we are fitting $m$ data points by $n$ parameters $\boldsymbol{x}_1, … , \boldsymbol{x}_n$. The matrix $A$ has $n$ columns and $n < m$.

Dependent Columns in $A$ What is $\hat{ \boldsymbol{x} }$?

如果 $A$ 的列里存在 dependant 列会变成什么情况呢?

不可能拟合出同时穿过两个点的直线 $C+Dt$, 我们把 $\boldsymbol{b}_1,\boldsymbol{b}_2$ 投射到 $\boldsymbol{p}=(1,1)$,这样我们就从解不出解到能解出无限个解的状态(虚线的直线都满足 $\boldsymbol{e}$ 足够小). 后面会在 “pseudoinverse” 章节讲解此情况.

Fitting by a Parabola

对于三维以上的抛物线最小二乘拟合, 使用线性代数是无法得到解析解的.

4.4 Orthonormal Bases and Gram-Schmidt

本节概要

  1. The columns $\boldsymbol{q}_{1}, \ldots, \boldsymbol{q}_{n}$ are orthonormal if $ \boldsymbol{q}_{i}^{\mathrm{T}} \boldsymbol{q}_{j}=\left\{\begin{array}{l}0 \text { for } i \neq j \\\\ 1 \text { for } i=j\end{array}\right\}$ . Then $Q^{\mathrm{T}} Q=I$.
  2. If $Q$ is also square, then $Q Q^{\mathrm{T}}=I$ and $Q^{\mathrm{T}}=Q^{-1} $. $Q$ is an “orthogonal matrix”.
  3. The least squares solution to $Q \boldsymbol{x}=\boldsymbol{b}$ is $\widehat{\boldsymbol{x}}=Q^{\mathrm{T}} \boldsymbol{b}$. Projection of $\boldsymbol{b}: \boldsymbol{p}=Q Q^{\mathrm{T}} \boldsymbol{b}=P \boldsymbol{b}$.
  4. The Gram-Schmidt process takes independent $\boldsymbol{a}_{i}$ to orthonormal $\boldsymbol{q}_{i}$ . Start with $\boldsymbol{q}_{1}=\boldsymbol{a}_{1} /\left\|\boldsymbol{a}_{1}\right\|$.
  5. $\boldsymbol{q}_{i}$ is ($\boldsymbol{a}_{i}-$ projection $\boldsymbol{p}_{i}$ $/\left\|\boldsymbol{a}_{i}-\boldsymbol{p}_{i}\right\| $; projection $\boldsymbol{p}_{i}=\left(\boldsymbol{a}_{i}^{\mathrm{T}} \boldsymbol{q}_{1}\right) \boldsymbol{q}_{1}+\cdots+\left(\boldsymbol{a}_{i}^{\mathrm{T}} \boldsymbol{q}_{i-1}\right) \boldsymbol{q}_{i-1}$.
  6. Each $\boldsymbol{a}_{i}$ will be a combination of $\boldsymbol{q}_{1}$ to $\boldsymbol{q}_{i}$ . Then $A= Q R$ : orthogonal $Q$ and triangular $R$ .

正交基与标准正交基

DEFINITION The vectors $\boldsymbol{q}_{1}, \ldots, \boldsymbol{q}_{n} $ are orthonormal if

$$ \boldsymbol{q}_{i}^{\mathrm{T}} \boldsymbol{q}_{j} = \left\{\begin{array}{lll}0 & \text { when } i \neq j & \text { (orthogonal vectors) } \\\\ 1 & \text { when } i = j & \left(\text { unit vectors: }\left\|\boldsymbol{q}_{i}\right\| = 1\right)\end{array}\right. $$

A matrix with orthonormal columns is assigned the special letter $Q$.

$Q$ is not required to be square.

A matrix $Q$ with orthonormal columns satisfies $Q^{\mathrm{T}} Q=I$:

$$ \boldsymbol{Q}^{\mathrm{T}} \boldsymbol{Q}=\left[\begin{array}{c} -\boldsymbol{q}_{1}^{\mathrm{T}}- \\\\ -\boldsymbol{q}_{2}^{\mathrm{T}}- \\\\ -\boldsymbol{q}_{n}^{\mathrm{T}}- \end{array}\right]\left[\begin{array}{ccc} \mid & \mid & \mid \\\\ \boldsymbol{q}_{1} & \boldsymbol{q}_{2} & \boldsymbol{q}_{n} \\\\ \mid & \mid & \mid \end{array}\right]=\left[\begin{array}{cccc} 1 & 0 & \cdots & 0 \\\\ 0 & 1 & \cdots & 0 \\\\ \vdots & \vdots & \ddots & \vdots \\\\ 0 & 0 & \cdots & 1 \end{array}\right]=I $$

$Q^T Q = I$ means that $Q^T = Q^{-1}$. transpose= inverse.

If the columns are only orthogonal (not unit vectors) , dot products still give a diagonal matrix (not the identity matrix). This diagonal matrix is almost as good as $I$.

In this square case we call $Q$ an orthogonal matrix.一个例子是旋转矩阵:

$$
Q=\left[\begin{array}{rr}
\cos \theta & -\sin \theta \\
\sin \theta & \cos \theta
\end{array}\right] \quad \text { and } \quad Q^{\mathrm{T}}=Q^{-1}=\left[\begin{array}{rr}
\cos \theta & \sin \theta \\
-\sin \theta & \cos \theta
\end{array}\right]
$$

The standard basis vectors $i$ and $j$ are rotated through $\theta$.

第二个例子: Permutation 换行矩阵

$$
\left[\begin{array}{lll}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0
\end{array}\right]\left[\begin{array}{l}
x \\
y \\
z
\end{array}\right]=\left[\begin{array}{l}
y \\
z \\
x
\end{array}\right] \quad \text { and } \quad\left[\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right]\left[\begin{array}{l}
x \\
y
\end{array}\right]=\left[\begin{array}{l}
y \\
x
\end{array}\right]
$$

第三个例子: Reflection 反射矩阵

If $\boldsymbol{u} $ is any unit vector, set $Q = I - 2 \boldsymbol{u} \boldsymbol{u} ^T$, $\boldsymbol{u} \boldsymbol{u}^T$ 是矩阵, $\boldsymbol{u} ^T \boldsymbol{u}$ 是标量 1. 可得 $Q=Q^T=Q^{-1}$.

$$
Q^{\mathrm{T}}=I-2 \boldsymbol{u} \boldsymbol{u}^{\mathrm{T}}=Q \quad \text { and } \quad Q^{\mathrm{T}} Q=I-4 \boldsymbol{u} \boldsymbol{u}^{\mathrm{T}}+4 \boldsymbol{u} \boldsymbol{u}^{\mathrm{T}} \boldsymbol{u} \boldsymbol{u}^{\mathrm{T}}=I
$$


图中 $ \boldsymbol{u} = (-1/ \sqrt{2}, 1/ \sqrt{2})$, 可以看到结果为 $i,j$ 的互换.

上面的三种变换有一个共通点–不改变向量长度:

If $Q$ has orthonormal columns $\left(Q^{\mathrm{T}} Q=I\right)$ , it leaves lengths unchanged:
Same length for $Q \boldsymbol{x}$: $|| Q \boldsymbol{x} || = ||\boldsymbol{x}||$ for every vector $\boldsymbol{x}$.

$Q$ also preserves dot products: $(Q \boldsymbol{x})^{\mathrm{T}}(Q \boldsymbol{y})=\boldsymbol{x}^{\mathrm{T}} Q^{\mathrm{T}} Q \boldsymbol{y}=\boldsymbol{x}^{\mathrm{T}} \boldsymbol{y}$ . Just use $Q^{\mathrm{T}} Q=I $!

$\textbf { Proof: }|Q \boldsymbol{x}|^{2} \text { equals }|\boldsymbol{x}|^{2} \text { because }(Q \boldsymbol{x})^{\mathrm{T}}(Q \boldsymbol{x})=\boldsymbol{x}^{\mathrm{T}} Q^{\mathrm{T}} Q \boldsymbol{x}=\boldsymbol{x}^{\mathrm{T}} I \boldsymbol{x}=\boldsymbol{x}^{\mathrm{T}} \boldsymbol{x}$

Projections Using Orthonormal Bases: $Q$ Replaces $A$

正交基的好处: 由于正交基有不改变长度的特性, Orthogonal matrices are excellent for computations-numbers can never grow too large when lengths of vectors are fixed. Stable computer codes use $Q$’s as much as possible.

使用场景例如: Projections and least squares always involve $A^T A$.

假设: $A^T A$ 化为 $Q^T Q = I$​(the basis vectors are actually orthonormal), 最小二乘法的求解公式变为(print a blank for the identity matrix):

$ \underline{\qquad} \widehat{\boldsymbol{x}}=Q^{\mathrm{T}} \boldsymbol{b} \quad \text { and } \quad \boldsymbol{p}=Q \widehat{\boldsymbol{x}} \quad \text { and } \quad P=Q \underline{\qquad} Q^T$

可以看到化成此形式的优点:

  1. 不需要求逆
  2. 求解只需要求解对一维向量上的投影即可(计算量减小了): $ \hat{ \boldsymbol{x} } =Q^T \boldsymbol{b}$ 求解点乘: $ \boldsymbol{q}_1 , . . . ,\boldsymbol{q}_n$ with $\boldsymbol{b}$.
    投影向量变为:
    $\boldsymbol{p} = Q \boldsymbol{x} = QQ^T \boldsymbol{b}$​

Projection onto $\boldsymbol{q}$​’s:

$$ \boldsymbol{p}=\left[\begin{array}{ccc}\mid & & \mid \\\\ \boldsymbol{q}_{1} & \cdots & \boldsymbol{q}_{n} \\\\ \mid & & \mid\end{array}\right]\left[\begin{array}{c}\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{b} \\\\ \vdots \\\\ \boldsymbol{q}_{n}^{\mathrm{T}} \boldsymbol{b}\end{array}\right]=\boldsymbol{q}_{1}\left(\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{b}\right)+\cdots+\boldsymbol{q}_{n}\left(\boldsymbol{q}_{n}^{\mathrm{T}} \boldsymbol{b}\right) $$

特殊情况下 $Q$ 为方阵时: $Q^T=Q^{-1}$, 即 the subspace is the whole space:

$\boldsymbol{b}= \boldsymbol{p} = \boldsymbol{b} = \boldsymbol{q}_{1}\left(\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{b}\right)+\boldsymbol{q}_{2}\left(\boldsymbol{q}_{2}^{\mathrm{T}} \boldsymbol{b}\right)+\cdots+\boldsymbol{q}_{n}\left(\boldsymbol{q}_{n}^{\mathrm{T}} \boldsymbol{b}\right)$

对于变换的意义:

$QQ^T = I$ 是 Fourier series 等其他应用数学里的其他变换的基础, 可以将 $\boldsymbol{b}$ 或者函数 $f(x)$ 分解成相互正交的分量并可以通过上面的式子逆转回来.

The Gram-Schmidt Process

既然 $Q$ 矩阵如此有用, 我们如何求解出来它呢?

问题示例:
$A$ 的组成向量 $\boldsymbol{a},\boldsymbol{b},\boldsymbol{c}$ 为线性无关列向量, 我们需要构造出中间的 $A,B,C$ 向量, 然后除以各向量的长度得到最终由 $ \boldsymbol{q}_1, \boldsymbol{q}_2, \boldsymbol{q}_3$ 组成的矩阵 $Q$. 即: $\boldsymbol{q} _1 = A/||A||, \boldsymbol{q}_2 = B/||B||, \boldsymbol{q} _3 = C /||C||$.

Gram-Schmidt 过程的主要思想是得到一个向量到前面所有的向量上的 error 分量(因为 error 总是与被投影的向量是垂直的, 正好满足我们希望所有向量互相垂直的要求):

  1. First Gram-Schmidt step($\boldsymbol{e} = \boldsymbol{b} - \boldsymbol{p}$):

$$
B= \boldsymbol{b}-\frac{A^{\mathrm{T}} \boldsymbol{b}}{A^{\mathrm{T}} A} A
$$

$A$​ 与 $B$​ 是垂直的: $A^T B =A^T \boldsymbol{b} - A^T \boldsymbol{b} = 0$​

  1. Next Gram-Schmidt step

$$
C=\boldsymbol{c}-\frac{A^{\mathrm{T}} \boldsymbol{c}}{A^{\mathrm{T}} A} A-\frac{B^{\mathrm{T}} \boldsymbol{c}}{B^{\mathrm{T}} B} B
$$

过程图解如下:

The Factorization $A= QR$

上面的 Gram-Schmidt Process 能否抽象成矩阵变换呢? 即 QR 分解.

$$ \left[\begin{array}{lll} & & \\\\ \boldsymbol{a} & \boldsymbol{b} & \boldsymbol{c} \\\\ & & \end{array}\right]=\left[\begin{array}{lll} & & \\\\ \boldsymbol{q}_{1} & \boldsymbol{q}_{2} & \boldsymbol{q}_{3} \\\\ & & \end{array}\right]\left[\begin{array}{lll} \boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{a} & \boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{b} & \boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{c} \\\\ & \boldsymbol{q}_{2}^{\mathrm{T}} \boldsymbol{b} & \boldsymbol{q}_{2}^{\mathrm{T}} \boldsymbol{c} \\\\ & & \boldsymbol{q}_{3}^{\mathrm{T}} \boldsymbol{c} \end{array}\right] \quad \text { or } \quad \boldsymbol{A}=\boldsymbol{Q} \boldsymbol{R} $$

可以看到 $R$​ 为上三角矩阵, 并且其对角线第 $i$​ 元素为第 $i$​​ 列的长度. 例子如下:

$$
A=\left[\begin{array}{rrr}
1 & 2 & 3 \\
-1 & 0 & -3 \\
0 & -2 & 3
\end{array}\right]=\left[\begin{array}{rrr}
1 / \sqrt{2} & 1 / \sqrt{6} & 1 / \sqrt{3} \\
-1 / \sqrt{2} & 1 / \sqrt{6} & 1 / \sqrt{3} \\
0 & -2 / \sqrt{6} & 1 / \sqrt{3}
\end{array}\right]\left[\begin{array}{ccr}
\sqrt{2} & \sqrt{2} & \sqrt{18} \\
0 & \sqrt{6} & -\sqrt{6} \\
0 & 0 & \sqrt{3}
\end{array}\right]=Q R
$$

最小二乘可以化成:
$$
R^{\mathrm{T}} R \widehat{\boldsymbol{x}}=R^{\mathrm{T}} Q^{\mathrm{T}} \boldsymbol{b} \text{ or } R \widehat{\boldsymbol{x}}=Q^{\mathrm{T}} \boldsymbol{b} \text{ or } \widehat{\boldsymbol{x}}=R^{-1} Q^{\mathrm{T}} \boldsymbol{b}
$$

通过增量式的改进可得 modified Gram-Schmidt 相比于上面的算法: more stable than equation above which subtracts all projections at once.

$$ \begin{array}{lll} \boldsymbol{q}_{1}=\boldsymbol{a}_{1} /\left\|\boldsymbol{a}_{1}\right\| & \boldsymbol{B}=\boldsymbol{a}_{2}-\left(\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{a}_{2}\right) \boldsymbol{q}_{1} & \boldsymbol{q}_{2}=\boldsymbol{B} /\|\boldsymbol{B}\| \\\\ \boldsymbol{C}^{*}=\boldsymbol{a}_{3}-\left(\boldsymbol{q}_{1}^{\mathrm{T}} \boldsymbol{a}_{3}\right) \boldsymbol{q}_{1} & \boldsymbol{C}=\boldsymbol{C}^{*}-\left(\boldsymbol{q}_{2}^{\mathrm{T}} \boldsymbol{C}^{*}\right) \boldsymbol{q}_{2} & \boldsymbol{q}_{3}=\boldsymbol{C} /\|\boldsymbol{C}\| \end{array} $$

然而 LAPACK 库不会使用 modified Gram-Schmidt 算法而是使用 Householder reflections 算法求得 upper triangular $R$. 具体在 Chapter 11 on numerical linear algebra 章节介绍.

在 MATLAB command to orthogonalize $A$ is $[Q, R] = \text{qr}(A)$.

Introdunction to Linear Algebra-第4章-Orthogonality

https://www.chuxin911.com/Introdunction_to_Linear_Algebra_Chapter4_20220220/

作者

cx

发布于

2022-02-20

更新于

2022-12-01

许可协议