lecture 11 recap - github pages · figure copyright and adapted from ian goodfellow, tutorial on...

Lecture 11 Recap

I2DL: Prof. Niessner, Dr. Dai 1

Transfer Learning

3I2DL: Prof. Niessner, Dr. Dai

Large dataset Small dataset

Distribution Distribution

Use what has been learned for another

setting

Transfer Learning

Trained on ImageNet New dataset with C classes

FROZEN

[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf

Source : http://cs231n.stanford.edu/slides/2016/winter1516_lecture11.pdf

Basic Structure of RNN• We want to have notion of “time” or “sequence”

Hidden state

InputPrevious hidden state

𝑨𝑡 = 𝜽𝑐𝑨𝑡−1 + 𝜽𝑥𝒙𝑡

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Term Dependencies

I moved to Germany … so I speak German fluently.Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Short Term Memory Units(LSTM)

Long-Short Term Memory Units• Key ingredients • Cell = transports the information through the unit

LSTM• Highway for the gradient to flow

RNNs in Computer Vision• Caption generation

[Xu et al., PMLR’15] Neural Image Caption Generation

Autoencoders

Machine Learning

• Labels or target classes

• Goal: Learn a mapping from input to label

• Classification, regression

Supervised learning

I2DL: Prof. Niessner, Dr. Dai

DOG DOG

Machine Learning

Unsupervised learning Supervised learning

Machine Learning

• No label or target class• Find out properties of

the structure of the data

• Clustering (k-means, PCA)

DOG DOG

Machine Learning

DOG DOG

Autoencoders• Unsupervised approach for learning a lower-

dimensional feature representation from unlabeled training data

Source: https://hackernoon.com

Autoencoders• From an input image

to a feature representation (bottleneck layer)

• Encoder: a CNN in our case

Input ImageSource: https://bit.ly/37dpsbQ

Autoencoders• Why do we need this dimensionality reduction?

• To capture the patterns, the most meaningful factors of variation in our data

• Other dimensionality reduction methods?

Autoencoder Training

Conv Transpose Conv

Input Image Output Image

ReconstructionLoss (like L1, L2)

Source: https://bit.ly/37dpsbQ

Autoencoder Training

Latent space 𝑧dim 𝑧 < dim(𝑥)

ut 𝑥

ion 𝑥′

Input images

Reconstructed images

Autoencoder Training• No labels

required

• We can use unlabeled data to first get its structure

ut 𝑥

ion 𝑥′

Autoencoder Use CasesEmbedding of

MNIST numbers

Source: https://lts2.epfl.ch/blog/perekres/2015/02/21/layer-by-layer-visualizations-of-mnist-dataset-feature-representations/

Autoencoder for Pre-Training• Test case: Medical applications based on CT images

– Large set of unlabeled data.– Small set of labeled data.

• We cannot take a network pre-trained on ImageNet. Why?

• The image features are different for CT vs natural images

Autoencoder for Pre-Training• Test case: medical applications based on CT images

– Large set of unlabeled data.– Small set of labeled data.

• We can pre-train our network using an autoencoder to “learn” the type of features present in CT images

Autoencoder for Pre-Training• Step 1: Unsupervised training with autoencoders

Input Reconstruction

Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data

Input Reconstruction

Throw away the decoder

Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data

Ground truth labels for supervised learning

Backprop as always

𝑥 𝑦

𝑦∗

Why use Autoencoders?• Pre-training, as mentioned before

– Image same image reconstructed– Use the encoder as “feature extractor”

• Use them to get pixel-wise predictions– Image semantic segmentation– Low-resolution image High-resolution image– Image Depth map

Autoencoders for Pixel-wise Predictions

Semantic Segmentation (FCN)• Recall the Fully Convolutional Networks

35[Long et al., CVPR’15] : Fully Convolutional Networks for Semantic Segmentation

Can we do better?

SegNet

36I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet

Ground Truth

SegNet

I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet• Encoder: normal convolutional filters + pooling

• Decoder: Upsampling + convolutional filters

SegNet• Encoder: Normal convolutional filters + pooling

SegNet• Encoder: normal convolutional filters + pooling

• The convolutional filters in the decoder are learned using backprop and their goal is to refine the upsampling

Generative Models• Given training data, how to generate new samples

from the same distribution

42I2DL: Prof. Niessner, Dr. Dai Source: https://openai.com/blog/generative-models/

Generative Models

Explicit Density Implicit Density

Tractable Density Approximate Density

Variational Markov Chain

Markov Chain Direct

Variational Autoencoder Boltzmann Machine

GSN GANFully Visible Belief Nets

Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017

Variational Autoencoders

Autoencoders• Encode the input into a representation (bottleneck)

and reconstruct it with the decoder

Conv Transpose Conv

Encoder Decoder

𝑥 𝑥

Autoencoders• Encode the input into a representation (bottleneck)

and reconstruct it with the decoder

Source: https://bit.ly/37ctFMS

Latent space learnedby autoencoder on MNIST

Variational Autoencoder

Conv Transpose Conv

Encoder Decoder

𝑥 𝑥𝜙 𝜃

𝑞𝜙 𝑧 𝑥 𝑝𝜃 𝑥 𝑧)

Conv Transpose Conv

Goal: Sample from the latent distribution to generate new outputs!

𝑥 𝑥𝜙 𝜃

• Latent space is now a distribution• Specifically it is a Gaussian

Encoder Decoder

Sample

𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

• Latent space is now a distribution• Specifically it is a Gaussian

EncoderMean

Diagonal covariance

𝑥 𝜙

𝜇𝑧|𝑥

Σ𝑧|𝑥

• Training

Encoder Decoder

Sample𝑥 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

Variational Autoencoder• Sampling operation is not differentiable

-> We can‘t backpropagate through the latent space

Encoder Decoder

Sample𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

Reparametrization Trick• Now we only need to backpropagate through an

addition and a multiplication

Encoder Decoder

Sample

𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝒩(0,1)

• Test: Sample from the latent space

Decoder

Sample 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

Autoencoder vs VAE

Autoencoder Variational Autoencoder Ground TruthSource: https://github.com/kvfrans/variational-autoencoder

Autoencoder Overview• Autoencoders (AE)

– Reconstruct input– Unsupervised learning– Latent space features are useful

• Variational Autoencoders (VAE)– Probability distribution in latent space (e.g., Gaussian)– Interpretable latent space (head pose, smile)– Sample from model to generate output

Generative Adversarial Networks (GANs)

Source: https://github.com/hindupuravinash/the-gan-zoo

Convolution and Deconvolution

Convolutionno padding, no stride

Source: https://github.com/vdumoulin/conv_arithmetic

Transposed convolutionno padding, no stride

Output

Autoencoder

Conv DeconvI2DL: Prof. Niessner, Dr. Dai 60

Decoder as Generative Model

Test time:-> reconstruction from

‘random’ vector

Output Image

ReconstructionLoss (often L2)

Interpolation between two chair models

[Dosovitsky et al., ‘14] Learning to Generate Chairs

Morphing betweenchair models

[Dosovitsky et al., ‘14] Learning to Generate Chairs

Latent space zdim (z) < dim (x)

“Test time”:-> reconstruction from

‘random’ vector

Reconstruction Loss Often L2, i.e., sum of squared dist.-> L2 distributes error equally

-> mean is opt.-> res. Is blurry

Instead of L2, can we “learn” a loss function?

[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)

𝑧𝐺

𝐺(𝑧)

𝐷(𝐺(𝑧))

𝑧𝐺

𝐺(𝑧)

𝐷(𝑥)

𝐷(𝐺(𝑧))

[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)

real data fake data

I2DL: Prof. Niessner, Dr. Dai [Goodfellow, NIPS‘16] Tutorial: Generative Adversarial Networks 69

• Minimax Game:– G minimizes probability that D is correct– Equilibrium is saddle point of discriminator loss

• Discriminator loss

• Generator loss binary cross entropy

GANs: Loss Functions

• D provides supervision (i.e., gradients) for G

I2DL: Prof. Niessner, Dr. Dai[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

𝐽 𝐷 = −1

2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −

2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛

𝐽(𝐺) = −𝐽 𝐷

• Heuristic Method (often used in practice)– G maximizes the log-probability of D being mistaken– G can still learn even when D rejects all generator samples

• Discriminator loss

GANs: Loss Functions

• Generator loss

𝐽 𝐷 = −1

𝐽(𝐺) = −1

2𝔼𝒛 log𝐷 𝐺 𝒛

71[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

Alternating Gradient Updates• Step 1: Fix G, and perform gradient step to

• Step 2: Fix D, and perform gradient step to

𝐽 𝐷 = −1

𝐽(𝐺) = −1

2𝔼𝒛 log𝐷 𝐺 𝒛

Training a GAN

Source: https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

GANs: Loss FunctionsMinimax

Heuristic

I2DL: Prof. Niessner, Dr. Dai 75[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

DCGAN: Generator

Generator of Deep Convolutional GANs

I2DL: Prof. Niessner, Dr. Dai 76[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

DCGAN: Results

Results on MNIST

77I2DL: Prof. Niessner, Dr. Dai[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

DCGAN: Results

Results on CelebA (200k relatively well aligned portrait photos)

[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Conditional Generative Adversarial

Networks (cGANs)79I2DL: Prof. Niessner, Dr. Dai

Pix2Pix: Image-to-Image Translation

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

real or fake?

Discriminator

z G(z)

Generator

min𝐺

max𝐷

𝔼𝑧,𝑥 log 𝐷(𝐺 𝑧 ) + log(1 − 𝐷 𝑥 )

min𝐺

max𝐷

𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )

real or fake?

Discriminator

Generator

min𝐺

max𝐷

DiscriminatorGenerator

min𝐺

max𝐷

min𝐺

max𝐷

DiscriminatorGenerator

Real too!

min𝐺

max𝐷

𝔼𝑥,𝑦 log𝐷(𝑥, 𝐺 𝑥 ) + log(1 − 𝐷 𝑥, 𝑦 )

real or fake pair?

match joint distribution p G x , y ∼ p(x, y)

fake pair real pair

85[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

Pix2Pix

Edges → ImagesInput Output Input Output Input Output

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial NetworksEdges from [Xie et al., ICCV’15] Holistically-Nested Edge Detection]

Sketches → ImagesInput Output Input Output Input Output

Trained on Edges → Images

Data from [Eitz, Hays, Alexa, 2012]I2DL: Prof. Niessner, Dr. Dai 88

Input Output Ground Truth

Data from maps.google.com

BW → ColorInput Output Input Output Input Output

Data from ImageNet [Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

GAN Applications

BigGAN: HD Image Generation

[Brock et al., ICLR‘18] BigGAN : Large Scale GAN Training for High Fidelity Natural Image Synthesis

StyleGAN: Face Image Generation

[Karras et al., ‘18] StyleGAN : A Style-Based Generator Architecture for Generative Adversarial Networks [Karras et al., ‘19] StyleGAN2 : Analyzing and Improving the Image Quality of StyleGAN

Cycle GAN: Unpaired Image-to-Image Translation

I2DL: Prof. Niessner, Dr. Dai[Zhu et al., ICCV‘17] Cycle GAN : Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

SPADE: GAN-Based Image Editing

95I2DL: Prof. Niessner, Dr. Dai[Park et al., CVPR‘19] SPADE : Semantic Image Synthesis with Spatially-Adaptive Normalization

Few-Shot-Vid2Vid: Single-Shot Video Generation

• https://www.youtube.com/watch?v=APoB1u3kTOU• https://www.youtube.com/watch?v=kkA6CHRovKA

[Wang et al., NeurIPS‘19] Few-Shot Video-to-Video Synthesis

References for Further Reading• https://towardsdatascience.com/intuitively-

understanding-variational-autoencoders-1bfe67eb5daf

• https://phillipi.github.io/pix2pix/

• http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf

Next Lecture

• Next lecture on 28th January: - Course outlook

• Reminder: Exercise 3 due tomorrow at 18:00

• Thursday exercise session on exercise 3 and exam tips

See you next time!

lecture 11 recap - github pages · figure copyright and adapted from ian goodfellow, tutorial on...

Documents

hierarchical bayesian data fusion using autoencoders

introduction to deep learning (i2dl)...i2dl: prof. niessner...

variational autoencoders for learning nonlinear dynamics

predictive coding with topographic variational autoencoders

disclaimer - seoul national...

intro to deep learning - autoencoders

autoencoders, minimum description length and helmholtz...

autoencoders - university at buffalo

a comprehensive study of autoencoders’ applications

deconvolution of autoencoders to learn biological

deep trajectory clustering with autoencoders

introduction to deep learningintroduction to deep learning...

item recommendation with variational autoencoders and

lecture 10: autoencoders

machine learning basics · machine learning basics i2dl:...

recent advances in variational autoencoders with

autoencoders and representation learning

variational autoencoders (vaes) -...

ladder variational autoencoders

multiresolution convolutional autoencoders