[NeurIPS 2020] Denoising Diffusion Probabilistic Models

Paper Review/Diffusion, VAE

[NeurIPS 2020] Denoising Diffusion Probabilistic Models

이성훈 Ethan 2024. 3. 4. 02:16

- Introduction

Diffusion: 확산

이 표현은 열역학(Thermodynamics)에서 원자나 분자가 농도가 높은 곳에서 낮은 곳으로 이동하는 현상을 모티브로 하였음

Diffusion (Probabilistic) Model 은 parameterized 된 Markov Chain 으로 이루어져 있으며, finite 시간이 지난 후에 data 에 맞는 이미지를 생성하도록 설계 되어 있음

Diffusion process 는 데이터에 noise 를 점진적으로 추가하는 방식으로 이루어짐

Diffusion 모델 자체로 직관적이고 학습하기 용이하지만 high quality sample 을 만드는 연구는 없었음 (ICML 2015 논문을 뜻하는듯)

- Method

Forward Process (Diffusion Process): $\mathbf{x}_0$ (data) + $\prod_{t=1}^T \mathcal{N} \rightarrow \mathbf{x}_T$ (Gaussian Noise)

Reverse Process: $\mathbf{X}_T$ (Gaussian Noise) - $\prod_{t=1}^T \mathcal{N} \rightarrow \mathbf{x}_0$ (data)

이 Reverse Process $q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{t}\right)$ 를 모델링 하는게 어려움😬

따라서 neural network 인 $p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)$ 를 사용해서 $q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{t}\right)$ 를 approximate 함

(이 부분에서 VAE 와 비슷하다고 느낌)

이때 $p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right) \approx q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{t}\right)$ 를 Diffusion Model 이라고 함

수식으로 좀 더 자세히 알아보자

Forward Process (Diffusion Process)

Autoregressive form: $q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)=\prod_{t=1}^T q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right)$

$q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right):=\mathcal{N}\left(\mathbf{x}_t ; \sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}\right)$

$\beta_t$ 는 degree of injecting noise 를 뜻함

이때, 원하는 시점의 $\mathbf{x}_t$ 를 자유롭게 sampling 할 수 있음

$q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)=\mathcal{N}\left(\mathbf{x}_t ; \sqrt{\bar{\alpha}_t} \mathbf{x}_0,\left(1-\bar{\alpha}_t\right) \mathbf{I}\right)$

($\alpha_t=1-\beta_t$, $\bar{\alpha}_t=\prod_{s=1}^t \alpha_s$) 증명은 논문 참고

Reverse Process

Autoregressive form:

$p_\theta\left(\mathbf{x}_{0: T}\right)=p\left(\mathbf{x}_T\right) \prod_{t=1}^T p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)$

$p\left(\mathbf{x}_T\right)=\mathcal{N}\left(\mathbf{x}_T ; \mathbf{0}, \mathbf{I}\right), \quad p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right):=\mathcal{N}\left(\mathbf{x}_{t-1} ; \boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right), \sum_\theta\left(\mathbf{x}_t, t\right)\right)$

이 때, $\boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right), \sum_\theta\left(\mathbf{x}_t, t\right)$ 은 learnable parameter

결론적으로 Diffusion Model 은 VAE 구조에 Progressive Denoising part 가 추가된 것이라고 생각하면 됨

Training

NLL 을 최소화 하도록 학습 (ELBO 를 사용하여 정리)

$\mathbb{E}\left[-\log p_\theta\left(\mathbf{x}_0\right)\right] \leq \mathbb{E}_q\left[-\log \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}\right]=\mathbb{E}_q\left[-\log p\left(\mathbf{x}_T\right)-\sum_{t \geq 1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right)}\right]=: L$

Simple Objective:

$\mathcal{L}_{\text {simple }}(\theta):=\mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}}\left[\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_\theta\left(\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\left(\sqrt{1-\bar{\alpha}_t}\right) \boldsymbol{\epsilon}, t\right)\right\|^2\right]$, where $t \sim \mathcal{U}(1, T)$

- Experiment

Interpolation 도 잘되는 것을 확인할 수 있음

- Reference

[1] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." NeurIPS 2020 [Paper link]

저작자표시 (새창열림)

'Paper Review > Diffusion, VAE' 카테고리의 다른 글

[CVPR 2022] High-Resolution Image Synthesis with Latent Diffusion Models (Stable-Diffusion) (0)	2024.03.07

현재글[NeurIPS 2020] Denoising Diffusion Probabilistic Models

Ethan's Winery

이성훈 Ethan

250x250

incremental learning, 딥러닝, GAN, dl, image classification, fewshot, 용어, Continual Learning,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Ethan's Winery