004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-‐Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli Proceedings of the 32nd InternaLonal Conference on Machine Learning, 2015

Tran Quoc Hoan Paper alert@2015-‐11-‐16

IntroducLon

•  Abstract: “…The essen)al idea, inspired by non-‐equilibrium sta)s)cal

physics, is to systema)cally and slowly destroy structure in a

data distribu)on through an itera)ve forward diffusion

process. We then learn a reverse diffusion process that

restores structure in data, yielding a highly flexible and

tractable genera)ve model of the data…”

2/30

Outline

•  MoLvaLon

-‐ The promise of deep unsupervised learning

•  Physical intuiLon -‐ Diffusion processes and )me reversal

•  Diffusion probabilisLc model

-‐ Deriva)on and experimental results

3/30

Deep Unsupervised Learning

•  Unknown features/labels – Novel modaliLes – Exploratory data analysis

4/30

•  Expensive labels

•  Unpredictable tasks / one shot learning

Physical IntuiLon

•  Diffusion processes and Lme reversal

– Destroy structure in data

– Carefully characterize the destrucLon

– Learn how to reverse Lme

5/30

ObservaLon 1: Diffusion Destroys Structure

6/30

(ObservaLon) Diffusion destroys structure

Data distribuLon Uniform distribuLon

Data distribuLon Uniform distribuLon (Recover structure)

Recover data distribuLon by starLng from uniform distribuLon and running dynamics backwards

ObservaLon 2: Microscopic Diffusion

•  Time reversible

•  Brownian moLon

•  PosiLon updates are small Gaussians (both forwards and backwards in

)me)

7/30

h_ps://www.youtube.com/watch?v=cDcprgWiQEY

Diffusion-‐based ProbabilisLc Models

•  Destroy all structure in data distribuLon using diffusion process

8/30

•  Learn reversal of diffusion process –  EsLmate funcLon for mean and covariance of each step in the reverse diffusion process (Ex. Binomial rate for binary data)

•  Reverse diffusion process is the model of the data

Diffusion-‐based ProbabilisLc Models

•  Algorithm

9/30

•  MulLplying distribuLons: inputaLon, denoising, compuLng posteriors

•  Deep convoluLonal network: universal funcLon approximator

Destroy by Diffusion Process

10/30

Data distribuLon

Forward diffusion

Noise distribuLon

Temporal diffusion rate

Destroy by Gaussian Diffusion Process

11/30

Data distribuLon

Forward diffusion

Noise distribuLon

Decay towards origin Add small noise

Reversal Gaussian Diffusion Process

12/30

Data distribuLon

Reverse diffusion

Noise distribuLon

Learned drid and covariance funcLons

Case Study: Swiss Roll

13/30

True model

Inference model

Training the Reverse Diffusion

14/30

Model probability

Annealed importance sampling / Jarzynski equality


15/30

Log Likelihood

Jensen’s inequality


16/30

…do some algebra…


17/30

…for Gaussian diffusion process…

Training

Unsupervised learning Regression


18/30

Segng the diffusion rate βt

•  For Gaussian diffusion

•  For Binomial diffusion (erase constant fracLon of sLmulus variance each step)

β1 = small constant (prevent over-‐figng) Training βt

MulLplying DistribuLons

19/30

Interested in

•  Required to compute posterior distribuLon – Missing data (inpainLng) – Corrupted data (denoising)

•  Difficult and expensive using compeLng techniques – Ex. VAE, GSNs, NADEs, most graphical models

Acts as small perturbaLon to diffusion process


20/30

Interested in

•  Modified marginal distribuLons



21/30

•  Modified diffusion steps Equilibrium condiLon

Corresponding normalized distribuLon


22/30

Interested in


Reversal Gaussian Diffusion Process

Small perturbaLon affects only mean

Deep Network as Approximator for Images

23/30

MulL-‐scale convoluLon

24/30

Downsample

Convolve

Upsample

Sum

Applied to CIFAR-‐10

25/30

Training Data Samples from GeneraLve Adversarial [Goodfellow et al, 2014]

Samples from diffusion model

Applied to CIFAR-‐10

26/30

Samples from DRAW

[Gregor et al, 2015]

Samples from GeneraLve Adversarial [Goodfellow et al, 2014]

Samples from diffusion model

Applied to Dead Leaves

27/30

Training data Samples from [Theis et al, 2012] Log likelihood 1.24 bits/pixel

Samples from diffusion model Log likelihood 1.49 bits/pixel

Applied to InpainLng

28/30

References

29/30

h_p://jmlr.org/proceedings/papers/v37/sohl-‐dickstein15.html

h_p://www.inference.vc/icml-‐paper-‐unsupervised-‐learning-‐by-‐inverLng-‐diffusion-‐processes/

h_p://videolectures.net/icml2015_sohl_dickstein_deep_unsupervised_learning/

004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics

Engineering