004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics

29
Deep Unsupervised Learning using Nonequilibrium Thermodynamics Jascha SohlDickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli Proceedings of the 32 nd InternaLonal Conference on Machine Learning, 2015 Tran Quoc Hoan Paper alert@20151116

Upload: ha-phuong

Post on 13-Apr-2017

119 views

Category:

Engineering


0 download

TRANSCRIPT

Deep  Unsupervised  Learning  using  Nonequilibrium  Thermodynamics  

Jascha  Sohl-­‐Dickstein,  Eric  A.  Weiss,  Niru  Maheswaranathan,  Surya  Ganguli  Proceedings  of  the  32nd  InternaLonal  Conference  on  Machine  Learning,  2015  

Tran  Quoc  Hoan  Paper  alert@2015-­‐11-­‐16  

IntroducLon  

•  Abstract:    “…The   essen)al   idea,   inspired   by   non-­‐equilibrium   sta)s)cal  

physics,   is   to  systema)cally  and  slowly  destroy  structure   in  a  

data   distribu)on   through   an   itera)ve   forward   diffusion  

process.      We  then  learn      a      reverse      diffusion      process      that      

restores   structure   in   data,   yielding   a   highly   flexible   and  

tractable  genera)ve  model  of  the  data…”  

2/30  

Outline  

•  MoLvaLon  

-­‐  The  promise  of  deep  unsupervised  learning  

•  Physical  intuiLon  -­‐  Diffusion  processes  and  )me  reversal  

•  Diffusion  probabilisLc  model  

-­‐  Deriva)on  and  experimental  results  

3/30  

Deep  Unsupervised  Learning  

•  Unknown  features/labels  – Novel  modaliLes  – Exploratory  data  analysis  

4/30  

•  Expensive  labels  

•  Unpredictable  tasks  /  one  shot  learning  

Physical  IntuiLon  

•  Diffusion  processes  and  Lme  reversal  

– Destroy  structure  in  data  

– Carefully  characterize  the  destrucLon  

– Learn  how  to  reverse  Lme  

5/30  

         ObservaLon  1:  Diffusion  Destroys  Structure  

6/30  

(ObservaLon)  Diffusion  destroys  structure  

Data  distribuLon   Uniform  distribuLon  

Data  distribuLon   Uniform  distribuLon  (Recover  structure)  

Recover  data  distribuLon  by  starLng    from  uniform  distribuLon  and  running  dynamics  backwards  

ObservaLon  2:  Microscopic  Diffusion  

•  Time  reversible  

•  Brownian  moLon  

•  PosiLon  updates  are  small  Gaussians  (both  forwards  and  backwards  in  

)me)  

7/30  

h_ps://www.youtube.com/watch?v=cDcprgWiQEY  

Diffusion-­‐based  ProbabilisLc  Models  

•  Destroy  all  structure  in  data  distribuLon  using  diffusion  process  

8/30  

•  Learn  reversal  of  diffusion  process  –  EsLmate  funcLon  for  mean  and  covariance  of  each  step  in  the  reverse  diffusion  process  (Ex.  Binomial  rate  for  binary  data)  

•  Reverse  diffusion  process  is  the  model  of  the  data  

             Diffusion-­‐based  ProbabilisLc  Models  

•  Algorithm  

9/30  

•  MulLplying  distribuLons:  inputaLon,  denoising,  compuLng  posteriors  

•  Deep  convoluLonal  network:  universal  funcLon  approximator  

Destroy  by  Diffusion  Process  

10/30  

Data    distribuLon  

Forward  diffusion  

Noise    distribuLon  

Temporal  diffusion  rate  

       Destroy  by  Gaussian  Diffusion  Process  

11/30  

Data    distribuLon  

Forward  diffusion  

Noise    distribuLon  

Decay  towards  origin   Add  small  noise  

       Reversal  Gaussian  Diffusion  Process    

12/30  

Data    distribuLon  

Reverse  diffusion  

Noise    distribuLon  

Learned  drid  and  covariance  funcLons  

Case  Study:  Swiss  Roll  

13/30  

True  model  

Inference  model  

Training  the  Reverse  Diffusion  

14/30  

Model  probability  

Annealed  importance  sampling  /  Jarzynski  equality  

Training  the  Reverse  Diffusion  

15/30  

Log  Likelihood  

Jensen’s  inequality  

Training  the  Reverse  Diffusion  

16/30  

…do  some  algebra…  

Training  the  Reverse  Diffusion  

17/30  

…for  Gaussian  diffusion  process…  

Training  

Unsupervised  learning   Regression  

Training  the  Reverse  Diffusion  

18/30  

Segng  the  diffusion  rate  βt  

•  For  Gaussian  diffusion  

•  For  Binomial  diffusion  (erase  constant  fracLon  of  sLmulus  variance  each  step)  

β1  =  small  constant  (prevent  over-­‐figng)  Training  βt  

MulLplying  DistribuLons  

19/30  

Interested  in  

•  Required  to  compute  posterior  distribuLon  – Missing  data  (inpainLng)  – Corrupted  data  (denoising)  

•  Difficult  and  expensive  using  compeLng  techniques  – Ex.  VAE,  GSNs,  NADEs,  most  graphical  models  

Acts  as  small  perturbaLon  to  diffusion  process  

MulLplying  DistribuLons  

20/30  

Interested  in  

•  Modified  marginal  distribuLons  

Acts  as  small  perturbaLon  to  diffusion  process  

MulLplying  DistribuLons  

21/30  

•  Modified  diffusion  steps  Equilibrium    condiLon  

Corresponding  normalized  distribuLon  

MulLplying  DistribuLons  

22/30  

Interested  in  

Acts  as  small  perturbaLon  to  diffusion  process  

Reversal  Gaussian  Diffusion  Process  

Small  perturbaLon  affects  only  mean    

       Deep  Network  as  Approximator  for  Images  

23/30  

MulL-­‐scale  convoluLon  

24/30  

Downsample  

Convolve  

Upsample  

Sum  

Applied  to  CIFAR-­‐10  

25/30  

Training  Data   Samples  from  GeneraLve  Adversarial  [Goodfellow  et  al,  2014]  

Samples  from  diffusion  model  

Applied  to  CIFAR-­‐10  

26/30  

Samples  from    DRAW  

[Gregor  et  al,  2015]  

Samples  from  GeneraLve  Adversarial  [Goodfellow  et  al,  2014]  

Samples  from  diffusion  model  

Applied  to  Dead  Leaves  

27/30  

Training  data   Samples  from  [Theis  et  al,  2012]  Log  likelihood    1.24  bits/pixel  

Samples  from  diffusion  model  Log  likelihood  1.49  bits/pixel  

Applied  to  InpainLng  

28/30  

References  

29/30  

h_p://jmlr.org/proceedings/papers/v37/sohl-­‐dickstein15.html  

h_p://www.inference.vc/icml-­‐paper-­‐unsupervised-­‐learning-­‐by-­‐inverLng-­‐diffusion-­‐processes/  

h_p://videolectures.net/icml2015_sohl_dickstein_deep_unsupervised_learning/