[LG Aimers Essential Course] Module 2

ETC/LG Aimers

[LG Aimers Essential Course] Module 2 - Mathematics for ML

이성훈 Ethan 2023. 7. 2. 03:42

LG Aimers 에서 제공하는 KAIST 신진우 교수님의 lecture 입니다.

Part 1: Matrix Decomposition

(1) Determinant and Trace

Determinant

$2\times2$ matrix and $3\times3$ matrix are expanded to this generalized form.

[DEF] For a matrix $\boldsymbol{A}\in \mathbb{R}^{n\times n}$, for all $j=1,...,n$, $det(\boldsymbol{A}) = \sum_{k=1}^{n}(-1)^{k+j}a_{kj}det(\boldsymbol{A}_{k,j})$

Trace

[DEF] The trace of a square matrix $\boldsymbol{A}\in \mathbb{R}^{n\times n}$ is defined as $tr(\boldsymbol{A})=\sum_{i=1}^{n}a_{ii}$

(2) Eigenvalues and Eigenvectors

[DEF] $\lambda \in \mathbb{R}$ is an eigenvalue of $\boldsymbol{A}$ and $\boldsymbol{x} \in \mathbb{R}^n \setminus \left\{ 0\right\}$ is the corresponding eigenvector of $\boldsymbol{A}$ if $\boldsymbol{Ax} = \lambda x$

The determinant of A is known as the mulitiplication of all eigenvectors and the trace of A is known as the addition of all eigenvectors.

(3) Cholesky Decomposition

촐스키... 라고 읽는다..

[THM] For a symmetric, positive definite matrix $\boldsymbol{A}$, $\boldsymbol{A}=\boldsymbol{L}\boldsymbol{L}^T$, where

(1) $\boldsymbol{L}$ is a lower-triangular matrix with positive diagonals

(2) Such a $\boldsymbol{L}$ is unique, called Cholesky factor of $\boldsymbol{A}$.

*Positive Definite: Every eigenvalue is positive

(4) Eigendecomposition and Diagonalization

Diagonal Matrix

[DEF] $\boldsymbol{A}$ is diagonalizable if it is similar to a diagonal matrix $\boldsymbol{D}$, i.e., and invertible $\boldsymbol{P}\in \mathbb{R}^{n\times n}$, such that $\boldsymbol{D}=\boldsymbol{P}^{-1}\boldsymbol{A}\boldsymbol{P}$ → Eigendecomposition

EigenValue Decomposition in only available for symmetric matrices

→ What about non-symmetric matrices?? → SVD

(5) Singular Value Decomposition

기존의 Eigenvalue Decomposition 에선 symmetric matrices 만 가능했기 때문에,

$\boldsymbol{A}\in \mathbb{R}^{m\times n}$ 이라는 non-square, non-symmetric 한 matrix를 $\boldsymbol{S}=\boldsymbol{A}^{T}\boldsymbol{A}\in \mathbb{R}^{n\times n}$로 바꾸어 활용

Singular Value Decomposition (SVD)

[THM] $\boldsymbol{A}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$

$\boldsymbol{U}, \boldsymbol{V}$ are orthogonal matrix

$\boldsymbol{\Sigma}$ is a diagonal matrix, and the diagonal entries are called singular values

Part 2: Convex Opimization

Optmization 은 ML 또는 DL 에서 model 을 학습할 때 매우 중요

(1) Optimization Using Gradient Descent

Goal: $\min{f(x)},\, f(x):\mathbb{R}^n \mapsto \mathbb{R},\, f\in C^1$

$x_{k+1}=x_k+\gamma_kd_k,\, k=0,1,2,...$ ⇔ ML/DL 에서의 $\theta_j = \theta_j - \alpha\frac{\partial }{\partial \theta_j}J(\theta)$

Batch gradient
Mini-batch gradient
Stochastic gradient: Noisy approximation of full batch gradient

SGD, ADAM 등...

(2) Constrained Optimization and Lagrange Multipliers

하지만 Inequality constraints, Equality constraint 가 있는 경우 위에서처럼 쉽게 optimize 를 할 수 없음

따라서

Lagrangian Dual Function: $\mathit{L}(x,\lambda,\nu ) = f(x)+\sum_{i=1}^{m}\lambda_ig_i(x)+\sum_{i=1}^{p}\nu_ih_i(x)$

Lagrangian Multipliers: $\lambda= (\lambda_i: i=1,...,m)\geq0,\,\nu=(\nu_1,...,\nu_p)$

Dual optimization 은 항상 구할 수 있다.

Weak Duality: $d^*\leq p^*$

Primal optimization 은 구하기 힘들더라도 Weak Duality 에 의해 Dual optimization 이 lower bound 가 된다.

(3) Convex Optimization, Convex Sets and Functions

Convex: 볼록한, Concave: 오목한

우리나라에선 보통 $y=x^2$를 '아래로 볼록' 이라고 말하는데 영어로도 'Convex' 하다고 볼 수 있다.

하지만 우리나라에선 보통 $y=-x^2$를 '위로 볼록' 이라고 말하는데 영어로는 'Concave', 즉 '아래로 오목' 하다고 볼 수 있다.

Convex Optimization Problem

Minimize $f(x)$ subject to $x\in\chi$, where $f(x):\mathbb{R}^n\mapsto \mathbb{R}$ is a convex function, and $\chi$ is a convex set.

KKT (Karush-Kuhn-Tucker) Optimality Condition

Strong Duality: $d^*= p^*$

Convex Set

For any $x_1,x_2 \in \mathit{C}$ and any $\theta\in[0,1]$, we have $\theta x_1 + (1-\theta)x_2\in\mathit{C}$

→ $mathit{C}$에서 두 점을 골라 이었을 때, 선분 위의 모든 점이 $mathit{C}$안에 있는 경우

Convex Functions

$f(\theta x+(1-\theta)y)\leq\theta f(x)+(1-\theta)f(y)$

→ Function 위에서 두 점을 골라 이었을 때, 선분의 내분점이 function 위에서의 내분점보다 크거나 같은 경우

Part 3: PCA

(1) Problem Setting

High-dimensional data → Low-dimensional data

Centering
Standardization
Eigenvalue/ vector
Projection
Undo Standardization and Centering

(2) Maximum Variance Persepective

분산이 최대가 되는 projection 을 해야 data 가 의미 있는 representation 을 나타냄

Optimization Problem: $max_{b_1}{b_1^T}\mathbf{S}b_1$, subject to $ \left\|b_1 \right\|^2=1$

(3) Eigenvector Computation and Low-Rank Approximations

EVD 또는 SVD 사용

따로 공부 필요..

(4) PCA in High Dimensions

이 또한 따로 공부 필요..

저작자표시

'ETC > LG Aimers' 카테고리의 다른 글

[LG Aimers Essential Course] Module 1 - AI 윤리 (0)	2023.07.02
LG Aimers 3기 선정! (0)	2023.07.01

현재글[LG Aimers Essential Course] Module 2 - Mathematics for ML

이성훈 Ethan

Continual Learning, 용어, dl, GAN, image classification, fewshot, incremental learning, 딥러닝,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31