variational inference and generative...
TRANSCRIPT
![Page 1: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/1.jpg)
Variational Inference and Generative Models
CS 294-112: Deep Reinforcement Learning
Sergey Levine
![Page 2: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/2.jpg)
Class Notes
1. Homework 3 due next Wednesday
2. Accept CMT peer review invitations• These are required (part of your final project grade)
• If you have not received/cannot find invitation, email Kate Rakelly!
![Page 3: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/3.jpg)
Where we are in the course
RL algorithms advanced topics
![Page 4: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/4.jpg)
1. Probabilistic latent variable models
2. Variational inference
3. Amortized variational inference
4. Generative models: variational autoencoders
• Goals• Understand latent variable models in deep learning
• Understand how to use (amortized) variational inference
Today’s Lecture
![Page 5: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/5.jpg)
Probabilistic models
![Page 6: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/6.jpg)
Latent variable models
mixtureelement
![Page 7: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/7.jpg)
Latent variable models in general
“easy” distribution(e.g., Gaussian)
“easy” distribution(e.g., Gaussian)
“easy” distribution(e.g., conditional Gaussian)
![Page 8: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/8.jpg)
Latent variable models in RLconditional latent variable
models for multi-modal policieslatent variable models for
model-based RL
![Page 9: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/9.jpg)
Other places we’ll see latent variable models
Mombaur et al. ‘09Muybridge (c. 1870) Ziebart ‘08Li & Todorov ‘06
Using RL/control + variational inference to model human behavior
Using generative models and variational inference for exploration
![Page 10: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/10.jpg)
How do we train latent variable models?
![Page 11: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/11.jpg)
Estimating the log-likelihood
![Page 12: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/12.jpg)
The variational approximation
![Page 13: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/13.jpg)
The variational approximation
Jensen’s inequality
![Page 14: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/14.jpg)
A brief aside…
Entropy:
Intuition 1: how random is the random variable?
Intuition 2: how large is the log probability in expectation under itself
high
low
this maximizes the first part
this also maximizes the second part(makes it as wide as possible)
![Page 15: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/15.jpg)
A brief aside…
KL-Divergence:
Intuition 1: how different are two distributions?
Intuition 2: how small is the expected log probability of one distribution under another, minus entropy?
why entropy?
this maximizes the first part
this also maximizes the second part(makes it as wide as possible)
![Page 16: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/16.jpg)
The variational approximation
![Page 17: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/17.jpg)
The variational approximation
![Page 18: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/18.jpg)
How do we use this?
how?
![Page 19: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/19.jpg)
What’s the problem?
![Page 20: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/20.jpg)
Break
![Page 21: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/21.jpg)
What’s the problem?
![Page 22: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/22.jpg)
Amortized variational inference
how do we calculate this?
![Page 23: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/23.jpg)
Amortized variational inference
look up formula for entropy of a Gaussian
can just use policy gradient!
What’s wrong with this gradient?
![Page 24: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/24.jpg)
The reparameterization trick
Is there a better way?
most autodiff software (e.g., TensorFlow) will compute this for you!
![Page 25: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/25.jpg)
Another way to look at it…
this often has a convenient analytical form (e.g., KL-divergence for Gaussians)
![Page 26: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/26.jpg)
Reparameterization trick vs. policy gradient
• Policy gradient• Can handle both discrete and
continuous latent variables
• High variance, requires multiplesamples & small learning rates
• Reparameterization trick• Only continuous latent variables
• Very simple to implement
• Low variance
![Page 27: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/27.jpg)
The variational autoencoder
![Page 28: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/28.jpg)
Using the variational autoencoder
![Page 29: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/29.jpg)
Conditional models
![Page 30: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/30.jpg)
Examples
![Page 31: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/31.jpg)
1. collect data
2. learn embedding of image & dynamics model (jointly)
3. run iLQG to learn to reach image of goal
a type of variational autoencoder with temporally decomposed latent state!
![Page 32: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/32.jpg)
Local models with images
![Page 33: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/33.jpg)
Local models with images
variational autoencoder with stochastic dynamics
![Page 34: Variational Inference and Generative Modelsrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-14.pdf · Variational Inference and Generative Models CS 294-112: Deep Reinforcement](https://reader033.vdocuments.mx/reader033/viewer/2022043009/5f9e1dff9abea9333d1d993a/html5/thumbnails/34.jpg)
We’ll see more of this for…
Mombaur et al. ‘09Muybridge (c. 1870) Ziebart ‘08Li & Todorov ‘06
Using RL/control + variational inference to model human behavior
Using generative models and variational inference for exploration