Highlight of Linear Algebra
$Ax=b$, $Ax=\lambda x$ $Av=\sigma u$, $\min{ \frac{||Ax||^2}{||x||^2}}$
-
Multiplication Ax using columns of A
- What is column, row space, independent vectors, basis, rank, CR decomposition
- Matrix mutiplication, outer product
-
Factorization
-
$A=LU$
- 不断消去上方单元 $Ax=b\to LUx=b \to Lc = b, c = Ux$
-
$A=QR$
-
特征值分解
- 对称矩阵,正定矩阵,半正定矩阵 (), the energy function
- Rank的一些性质
- $A+sI \to \lambda_1+s, \lambda_2+s, $
-
奇异值分解 $Av_1 = \sigma_1 u_1 $
-
derive from $AA^T$ and $A^TA$
-
How to compute
-
The relation between left eigenvector and right, more vectors with 0
-
If $A-xy^T$ has rank 1, $\sigma_1 \ge |\lambda |$
-
Reflect(affine), scale, reflect, the reduced form
-
The function understanding of SVD
-
Function form, polar decomposition $rcos\theta+irsin\theta = re^{i\theta}$ A = QS, orthgonal, semi-positive definite, seperate rotation from strech
- \[A = U\Sigma V^T=(UV^T)(V\Sigma V^T) = QS\]
-
-
-
orthogonal matrix and
- Orthogonal basis: a and c $c = b - \frac{a^Tb}{a^Ta}a$
- Orthogonal projection: $P = QQ^T$ Projection matrix $P^2=P$
- co-efficient of orthogonal basis: $c_1=q_1^Tv$
- orthogonal matrix and vector will not change the matrix norm. Then the matrix norm is connected with SVD as $A=U\Sigma V^T$, as U and V are all orthogonal vectors, matrix norm can connect with $\Sigma$ matrix
-
norm
- Eckart young if B has rank k, $||A-B|| \ge ||A - A_k||$
- Inner product
- Matrix Norm: three norm only effected by $\sigma$, what about others.
- l2 norm: $\max \frac{||Ax||}{||x||} = \sigma_1$
- Frobenius: $\sqrt{\sigma_1^2+\sigma_2^2+\cdots + \sigma_r^2}$
- Nuclear norm: $\sigma_1+\sigma_2+\cdots + \sigma_r$ The minimum value of $||U||_F||V||_F$ $||A^TA||_N=||A||_F^2$
- Vector Norm $l_0, l_1, l_2, l_\infty$
- The inituition or the matrix norm: The minimum of $||V||_p$
- Norm的性质: Rescale, triangle
- Function norm-> vector space to be completed $||v_n - v_\infty|| \to 0 $
- ending with all zero is not completed p105
- $norm < \infty$
- spectral radius: about the stationary of Markov chain
-
Application
-
PCA
- The stastics behind
- variances are the diagonal entries of the matrix $AA^T$
- covariances are the off-diagonal entries of matrix $AA^T$
- The geometric behind The sum of squared distances from the data points to the $u_1$ line is a minimum.
- linear algreba behind Total variance $T$ is the sum of sigma
- The quick drop of $\sigma$ in hilebert matrix
- The stastics behind
-
Rayleigh Quotients, generalized eigenvalue $Sx_1 = \lambda M x_1$ 99页 \(R(x) = \frac{x^TSx}{x^Tx}\)
- Generalized Rayleigh Quotients $R(x) = \frac{x^TSx}{X^TMx}$
- M: covariance matrix, is positive definite, maximum of $R(x)$ is largest eigenvalue of $M^{-1}S$, $M^{-\frac{1}{2}}SM^{-\frac{1}{2}}$
- Generalized Eigenvectors and M-orthogonal $x_1^TMx_2 = 0$ , $x = M^{-\frac{1}{2}}y$
- Semi-definite situation $\alpha Sx = \beta Mx$ , $\alpha$ may equal to 0. number of samples smaller than features
- Generalized SVD $A=U_A\Sigma_AZ$ $B=U_B\Sigma_BZ$
- Any two positive definite matrix can be decomposed by the same inverse matrix
- Generalized Rayleigh Quotients $R(x) = \frac{x^TSx}{X^TMx}$
-
LDA
Seperated rate \(R = \frac{(x^Tm_1-x^Tm_2)^2}{x^T\Sigma_1x+x^T\Sigma_2x}\) $S = (m_1-m_2)(m_1-m_2)^T$
求解背后的物理意义没太看懂
-
$\lambda_1 \le \sigma_1$
AB and BA
$ABx=\lambda x$, $BABx=\lambda Bx$ , $Bx$是BA的eigenvector eigenvector也有对应的关系
Low rank and Compressed Sensing
- Key insights
- matrix are composed of small rank matrix: ($uv^T$) is extreme case with rank 1
- Singular value: low effective rank.
- Most matrices are completed by low rank matrix.
-
How matrix change when add small rank matrix
-
Normal Perspective(use small matrix and exchange for the larger matrix)
-
$A^{-1}$ \((A-UV^T)^{-1} = A^{-1}+A^{-1}U(I-V^TA^{-1}U)^{-1}V^TA^{-1}\)
\[(I-UV^T)^{-1} = I +U(I-V^TU)^{-1}V^T\] -
eigenvalue and signal values (interlacing)
A graphical explanation: the solution is an inverse function which the last one can never go beyond the first one. \(z_1 \ge \lambda_1 \ge \cdots\)
-
-
-
Differentiate perspective
-
$A^{-1}$ \(\frac{dA^{-1}}{dt} = - A^{-1}\frac{dA}{dt}A^{-1}\)
\frac{d\lambda}{dt}=y^T\frac{dA}{dt}x $$
- \[\lambda_{max}(S+T)\le \lambda_{max}(S) + \lambda_{max}(T)\]
However, it is hard to find the intermediate ones.
- Weyl inequality
\(\lambda_{i+j-1}(A+B)\le \sigma_i(A)+\sigma_j(A)\)
-
-
Saddle points from lagrange multipliers
- Lagrangian: $L(x, \lambda) = \frac{1}{2}x^TSx + \lambda^T(Ax-b)$ which will produce a Lag
- application
- update least square, has a new row
- Kalman filter TODO reading P112