multi-prediction deep boltzmann...

53
Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville, Bengio Vipul Venkataraman Nov 29, 2016

Upload: others

Post on 13-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Prediction Deep Boltzmann Machines

Goodfellow, Mirza, Courville, Bengio

Vipul Venkataraman Nov 29, 2016

Page 2: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Outline• Goal of the paper [1]

• A primer on RBMs and DBMs

• Training DBMs

• Proposed method: motivations and intuitions

• Results

• Conclusions

Page 3: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Goal

Page 4: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Goal of the paper

Make training unsupervised models great again!

Page 5: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Goal of the paper

Make training unsupervised models great again!

Deep Boltzmann Machines

Page 6: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Preliminaries

Page 7: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

Image: [2]

Page 8: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

Image: [2]

Training

Page 9: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

• Unsupervised

• Generative model

• Feature learning algorithm

Page 10: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

• Unsupervised

• Generative model

• Feature learning algorithm

2

Page 11: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

• Unsupervised

• Generative model

• Feature learning algorithm

2

Page 12: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

Page 13: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Training Methods

Page 14: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Deep Boltzmann Machines

Classification

• Exact inference is intractable

• Use mean field expectations of the hidden units

Page 15: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Training DBMsSteps [2]:

1. Layer-wise pre-training

• Unsupervised

• RBMs as building blocks

2. Discriminative fine-tuning

• Supervised

• Back-propagationGood reference: https://www.youtube.com/watch?v=Oq38pINmddk

Page 16: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Pre-training

RBM

Page 17: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

RBM

Pre-training

Page 18: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Fine-tuning

MLP

Page 19: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

!Pre-training

• Deep: good features

• Can use any unsupervised algorithm

• RBM (w/ CD)

• Auto-encoder

Fine-tuning

• Won’t make drastic changes

• Need less labelled data

• Can use a lot of unlabelled data

Page 20: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

"

• Greedy training, not considering global interactions

• Many models, criteria

• Extra classifier as well

• CD-k: we don’t know k

• Gradient approximation may be bad

Page 21: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

An Aside: CD Intuition

Page 22: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

An Aside: CD Intuition

Page 23: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

An Aside: CD Intuition

• Far away ‘holes’

• May want our particles to move many steps [3]

• The mixing may get slower

• CD-1 -> CD-3 -> CD-10

Page 24: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Solutions

Page 25: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Proposed method

• Mantra: Simplify

Page 26: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Proposed method

• Mantra: Simplify

• Many models -> one model

Page 27: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Proposed method

• Mantra: Simplify

• Many models -> one model

• Many criteria -> one criterion

Page 28: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Proposed method

• Mantra: Simplify

• Many models -> one model

• Many criteria -> one criterion

• Extra classification layer at the top -> unified model

Page 29: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Quick recap

Page 30: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Prediction Training

Page 31: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Random bit-mask

Page 32: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Example 1

Page 33: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Example 1, update 1

Page 34: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Example 1, update 2

Two mean-field fixed point updates

Page 35: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Example 2, all updates

Page 36: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Example 3, all updates

Page 37: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

One iteration

Page 38: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

One iteration

Minibatch

Page 39: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

One iteration

Minibatch Backprop

Page 40: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Performance

• Works well (results in a bit)

• Expensive though

• Needing to run several iterations for convergence

Page 41: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Inference Trick

Mean field

Page 42: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Inference Trick

Mean field

Multi-inference

average with the mean-field estimate

Page 43: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Inference Trick

Mean field

Multi-inference

average with the mean-field estimate

Nesterov’s accelerated gradient descent

Page 44: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Results

Can someone find me a suitable picture?

Page 45: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Multi-Inference Trick

Image: Goodfellow's defense

Page 46: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Setting• Dataset: MNIST

• First layer: 500 hidden units

• Second layer: 1000 hidden units

• Minibatch size: 100 examples

• Test set: 10000 examples

• For more related results: [1]

Page 47: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Classification

MP-DBM with 2X hidden units: 0.91

Page 48: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Robustness

Page 49: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Missing inputs

Page 50: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Conclusions

• Simpler, more intuitive methodology for training Deep Boltzmann Machines

• Improved accuracy for approximate inference problems

Page 51: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

References

[1] Goodfellow, Mirza, Courville, and Bengio. Multi-prediction deep Boltzmann machines. NIPS ’13.

[2] Salakhutdinov and Hinton. Deep Boltzmann machines. AISTATS 2009.

[3] Hinton. Neural Networks for Machine Learning. Coursera.

Page 52: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

Questions?

Page 53: Multi-Prediction Deep Boltzmann Machinesswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide3.pdf · Multi-Prediction Deep Boltzmann Machines Goodfellow, Mirza, Courville,

#