lecture 11 recap - github pages · figure copyright and adapted from ian goodfellow, tutorial on...

Post on 17-Jun-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 11 Recap

I2DL: Prof. Niessner, Dr. Dai 1

Transfer Learning

3I2DL: Prof. Niessner, Dr. Dai

P1 P2

Large dataset Small dataset

Distribution Distribution

Use what has been learned for another

setting

Transfer Learning

5I2DL: Prof. Niessner, Dr. Dai

Trained on ImageNet New dataset with C classes

TRAIN

FROZEN

[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf

Source : http://cs231n.stanford.edu/slides/2016/winter1516_lecture11.pdf

Basic Structure of RNN• We want to have notion of “time” or “sequence”

7I2DL: Prof. Niessner, Dr. Dai

Hidden state

InputPrevious hidden state

𝑨𝑡 = 𝜽𝑐𝑨𝑡−1 + 𝜽𝑥𝒙𝑡

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Term Dependencies

8I2DL: Prof. Niessner, Dr. Dai

I moved to Germany … so I speak German fluently.Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Short Term Memory Units(LSTM)

11I2DL: Prof. Niessner, Dr. Dai

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Short Term Memory Units• Key ingredients • Cell = transports the information through the unit

12I2DL: Prof. Niessner, Dr. Dai

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTM• Highway for the gradient to flow

13I2DL: Prof. Niessner, Dr. Dai

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

RNNs in Computer Vision• Caption generation

14I2DL: Prof. Niessner, Dr. Dai

[Xu et al., PMLR’15] Neural Image Caption Generation

Autoencoders

I2DL: Prof. Niessner, Dr. Dai 15

Machine Learning

16

• Labels or target classes

• Goal: Learn a mapping from input to label

• Classification, regression

Supervised learning

I2DL: Prof. Niessner, Dr. Dai

DOG DOG

DOG

CAT

CAT

CAT

Machine Learning

17

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Dr. Dai

Unsupervised learning Supervised learning

Machine Learning

• No label or target class• Find out properties of

the structure of the data

• Clustering (k-means, PCA)

18

DOG DOG

DOG

CAT

CAT

CAT

I2DL: Prof. Niessner, Dr. Dai

DOG DOG

DOG

CAT

CAT

CAT

Machine Learning

19

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Dr. Dai

Machine Learning

20

Unsupervised learning Supervised learning

DOG DOG

DOG

CAT

CAT

CAT

I2DL: Prof. Niessner, Dr. Dai

Autoencoders• Unsupervised approach for learning a lower-

dimensional feature representation from unlabeled training data

21I2DL: Prof. Niessner, Dr. Dai

Source: https://hackernoon.com

Autoencoders• From an input image

to a feature representation (bottleneck layer)

• Encoder: a CNN in our case

22I2DL: Prof. Niessner, Dr. Dai

𝑥

Conv

𝑧

Input ImageSource: https://bit.ly/37dpsbQ

Autoencoders• Why do we need this dimensionality reduction?

• To capture the patterns, the most meaningful factors of variation in our data

• Other dimensionality reduction methods?

23I2DL: Prof. Niessner, Dr. Dai

Autoencoder Training

24

Conv Transpose Conv

Input Image Output Image

ReconstructionLoss (like L1, L2)

I2DL: Prof. Niessner, Dr. Dai

Source: https://bit.ly/37dpsbQ

Autoencoder Training

25

Latent space 𝑧dim 𝑧 < dim(𝑥)

Inp

ut 𝑥

Rec

onst

ruct

ion 𝑥′

Input images

Reconstructed images

I2DL: Prof. Niessner, Dr. Dai

26

Autoencoder Training• No labels

required

• We can use unlabeled data to first get its structure

I2DL: Prof. Niessner, Dr. Dai

Latent space 𝑧dim 𝑧 < dim(𝑥)

Inp

ut 𝑥

Rec

onst

ruct

ion 𝑥′

27

Autoencoder Use CasesEmbedding of

MNIST numbers

I2DL: Prof. Niessner, Dr. Dai

Source: https://lts2.epfl.ch/blog/perekres/2015/02/21/layer-by-layer-visualizations-of-mnist-dataset-feature-representations/

28

Autoencoder for Pre-Training• Test case: Medical applications based on CT images

– Large set of unlabeled data.– Small set of labeled data.

• We cannot take a network pre-trained on ImageNet. Why?

• The image features are different for CT vs natural images

I2DL: Prof. Niessner, Dr. Dai

29

Autoencoder for Pre-Training• Test case: medical applications based on CT images

– Large set of unlabeled data.– Small set of labeled data.

• We can pre-train our network using an autoencoder to “learn” the type of features present in CT images

I2DL: Prof. Niessner, Dr. Dai

30

Autoencoder for Pre-Training• Step 1: Unsupervised training with autoencoders

Input Reconstruction

I2DL: Prof. Niessner, Dr. Dai

Source: https://bit.ly/37dpsbQ

31

Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data

Input Reconstruction

Throw away the decoder

I2DL: Prof. Niessner, Dr. Dai

Source: https://bit.ly/37dpsbQ

Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data

32

Input

Ground truth labels for supervised learning

Loss

Backprop as always

I2DL: Prof. Niessner, Dr. Dai

𝑥 𝑦

𝑦∗

𝑧

Why use Autoencoders?• Pre-training, as mentioned before

– Image same image reconstructed– Use the encoder as “feature extractor”

• Use them to get pixel-wise predictions– Image semantic segmentation– Low-resolution image High-resolution image– Image Depth map

33I2DL: Prof. Niessner, Dr. Dai

Autoencoders for Pixel-wise Predictions

34I2DL: Prof. Niessner, Dr. Dai

Semantic Segmentation (FCN)• Recall the Fully Convolutional Networks

35[Long et al., CVPR’15] : Fully Convolutional Networks for Semantic Segmentation

Can we do better?

I2DL: Prof. Niessner, Dr. Dai

SegNet

36I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet

37

Input

Ground Truth

SegNet

I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet• Encoder: normal convolutional filters + pooling

• Decoder: Upsampling + convolutional filters

38I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet• Encoder: Normal convolutional filters + pooling

• Decoder: Upsampling + convolutional filters

39I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet• Encoder: normal convolutional filters + pooling

• Decoder: Upsampling + convolutional filters

• The convolutional filters in the decoder are learned using backprop and their goal is to refine the upsampling

40I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Generative Models• Given training data, how to generate new samples

from the same distribution

42I2DL: Prof. Niessner, Dr. Dai Source: https://openai.com/blog/generative-models/

Rea

l Im

ages

Gen

erat

ed Im

ages

Generative Models

43I2DL: Prof. Niessner, Dr. Dai

Explicit Density Implicit Density

Tractable Density Approximate Density

Variational Markov Chain

Markov Chain Direct

Variational Autoencoder Boltzmann Machine

GSN GANFully Visible Belief Nets

Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017

44

Variational Autoencoders

I2DL: Prof. Niessner, Dr. Dai

Autoencoders• Encode the input into a representation (bottleneck)

and reconstruct it with the decoder

45

Conv Transpose Conv

Encoder Decoder

I2DL: Prof. Niessner, Dr. Dai

𝑥 𝑥

𝑧

Autoencoders• Encode the input into a representation (bottleneck)

and reconstruct it with the decoder

46I2DL: Prof. Niessner, Dr. Dai

Source: https://bit.ly/37ctFMS

Latent space learnedby autoencoder on MNIST

𝑧

Variational Autoencoder

47

Conv Transpose Conv

Encoder Decoder

I2DL: Prof. Niessner, Dr. Dai

𝑥 𝑥𝜙 𝜃

𝑞𝜙 𝑧 𝑥 𝑝𝜃 𝑥 𝑧)

48

Variational Autoencoder

Conv Transpose Conv

Goal: Sample from the latent distribution to generate new outputs!

I2DL: Prof. Niessner, Dr. Dai

𝑧

𝑥 𝑥𝜙 𝜃

Variational Autoencoder

49

• Latent space is now a distribution• Specifically it is a Gaussian

Encoder Decoder

Sample

I2DL: Prof. Niessner, Dr. Dai

𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

Variational Autoencoder

50

• Latent space is now a distribution• Specifically it is a Gaussian

EncoderMean

Diagonal covariance

I2DL: Prof. Niessner, Dr. Dai

𝑥 𝜙

𝜇𝑧|𝑥

Σ𝑧|𝑥

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

Variational Autoencoder

51

• Training

I2DL: Prof. Niessner, Dr. Dai

Encoder Decoder

Sample𝑥 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

𝜙

Variational Autoencoder• Sampling operation is not differentiable

-> We can‘t backpropagate through the latent space

52I2DL: Prof. Niessner, Dr. Dai

Encoder Decoder

Sample𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

🚫

Reparametrization Trick• Now we only need to backpropagate through an

addition and a multiplication

53I2DL: Prof. Niessner, Dr. Dai

Encoder Decoder

Sample

𝑥 𝜙 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝒩(0,1)

* +

Variational Autoencoder

54

• Test: Sample from the latent space

I2DL: Prof. Niessner, Dr. Dai

Decoder

Sample 𝜃 𝑥

𝜇𝑧|𝑥

Σ𝑧|𝑥 𝑧

𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)

Autoencoder vs VAE

Autoencoder Variational Autoencoder Ground TruthSource: https://github.com/kvfrans/variational-autoencoder

55I2DL: Prof. Niessner, Dr. Dai

Autoencoder Overview• Autoencoders (AE)

– Reconstruct input– Unsupervised learning– Latent space features are useful

• Variational Autoencoders (VAE)– Probability distribution in latent space (e.g., Gaussian)– Interpretable latent space (head pose, smile)– Sample from model to generate output

56I2DL: Prof. Niessner, Dr. Dai

Generative Adversarial Networks (GANs)

57I2DL: Prof. Niessner, Dr. Dai

Generative Adversarial Networks (GANs)

58

Source: https://github.com/hindupuravinash/the-gan-zoo

I2DL: Prof. Niessner, Dr. Dai

Convolution and Deconvolution

Convolutionno padding, no stride

Source: https://github.com/vdumoulin/conv_arithmetic

Transposed convolutionno padding, no stride

Input

Output

Input

Output

I2DL: Prof. Niessner, Dr. Dai 59

Autoencoder

Conv DeconvI2DL: Prof. Niessner, Dr. Dai 60

Decoder as Generative Model

Latent space 𝑧dim 𝑧 < dim(𝑥)

Test time:-> reconstruction from

‘random’ vector

Output Image

ReconstructionLoss (often L2)

I2DL: Prof. Niessner, Dr. Dai 63

Decoder as Generative Model

Interpolation between two chair models

I2DL: Prof. Niessner, Dr. Dai 64

[Dosovitsky et al., ‘14] Learning to Generate Chairs

Decoder as Generative Model

Morphing betweenchair models

I2DL: Prof. Niessner, Dr. Dai 65

[Dosovitsky et al., ‘14] Learning to Generate Chairs

Decoder as Generative Model

Latent space zdim (z) < dim (x)

“Test time”:-> reconstruction from

‘random’ vector

Reconstruction Loss Often L2, i.e., sum of squared dist.-> L2 distributes error equally

-> mean is opt.-> res. Is blurry

Instead of L2, can we “learn” a loss function?

I2DL: Prof. Niessner, Dr. Dai 66

Generative Adversarial Networks (GANs)

[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)

67

𝑧𝐺

𝐺(𝑧)

𝐷

𝐷(𝐺(𝑧))

I2DL: Prof. Niessner, Dr. Dai

Generative Adversarial Networks (GANs)

68

𝑧𝐺

𝐺(𝑧)

𝐷

𝑥

𝐷(𝑥)

𝐷(𝐺(𝑧))

I2DL: Prof. Niessner, Dr. Dai

[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)

Generative Adversarial Networks (GANs)

real data fake data

I2DL: Prof. Niessner, Dr. Dai [Goodfellow, NIPS‘16] Tutorial: Generative Adversarial Networks 69

• Minimax Game:– G minimizes probability that D is correct– Equilibrium is saddle point of discriminator loss

• Discriminator loss

• Generator loss binary cross entropy

GANs: Loss Functions

• D provides supervision (i.e., gradients) for G

I2DL: Prof. Niessner, Dr. Dai[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

𝐽 𝐷 = −1

2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −

1

2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛

𝐽(𝐺) = −𝐽 𝐷

70

• Heuristic Method (often used in practice)– G maximizes the log-probability of D being mistaken– G can still learn even when D rejects all generator samples

• Discriminator loss

GANs: Loss Functions

• Generator loss

I2DL: Prof. Niessner, Dr. Dai

𝐽 𝐷 = −1

2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −

1

2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛

𝐽(𝐺) = −1

2𝔼𝒛 log𝐷 𝐺 𝒛

71[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

Alternating Gradient Updates• Step 1: Fix G, and perform gradient step to

• Step 2: Fix D, and perform gradient step to

72I2DL: Prof. Niessner, Dr. Dai

𝐽 𝐷 = −1

2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −

1

2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛

𝐽(𝐺) = −1

2𝔼𝒛 log𝐷 𝐺 𝒛

Training a GAN

74

Source: https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

I2DL: Prof. Niessner, Dr. Dai

GANs: Loss FunctionsMinimax

Heuristic

I2DL: Prof. Niessner, Dr. Dai 75[Goodfellow et al., NIPS‘14] Generative Adversarial Networks

DCGAN: Generator

Generator of Deep Convolutional GANs

I2DL: Prof. Niessner, Dr. Dai 76[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

DCGAN: Results

Results on MNIST

77I2DL: Prof. Niessner, Dr. Dai[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

DCGAN: Results

Results on CelebA (200k relatively well aligned portrait photos)

I2DL: Prof. Niessner, Dr. Dai 78

[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Conditional Generative Adversarial

Networks (cGANs)79I2DL: Prof. Niessner, Dr. Dai

Pix2Pix: Image-to-Image Translation

I2DL: Prof. Niessner, Dr. Dai

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

80

real or fake?

Discriminator

z G(z)

D

Generator

G

min𝐺

max𝐷

𝔼𝑧,𝑥 log 𝐷(𝐺 𝑧 ) + log(1 − 𝐷 𝑥 )

I2DL: Prof. Niessner, Dr. Dai

min𝐺

max𝐷

𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )

81

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

G(x)x

real or fake?

Discriminator

D

Generator

G

I2DL: Prof. Niessner, Dr. Dai

min𝐺

max𝐷

𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )

82

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

Real!

DiscriminatorGenerator

I2DL: Prof. Niessner, Dr. Dai

G(x)x

DG

min𝐺

max𝐷

𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )

83

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

min𝐺

max𝐷

𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )

DiscriminatorGenerator

Real too!

I2DL: Prof. Niessner, Dr. Dai

G(x)x

DG

84

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

min𝐺

max𝐷

𝔼𝑥,𝑦 log𝐷(𝑥, 𝐺 𝑥 ) + log(1 − 𝐷 𝑥, 𝑦 )

real or fake pair?

match joint distribution p G x , y ∼ p(x, y)

fake pair real pair

I2DL: Prof. Niessner, Dr. Dai

G

G(x)x

D

85[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

Pix2Pix

86I2DL: Prof. Niessner, Dr. Dai

Edges → ImagesInput Output Input Output Input Output

I2DL: Prof. Niessner, Dr. Dai

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial NetworksEdges from [Xie et al., ICCV’15] Holistically-Nested Edge Detection]

87

Sketches → ImagesInput Output Input Output Input Output

Trained on Edges → Images

Data from [Eitz, Hays, Alexa, 2012]I2DL: Prof. Niessner, Dr. Dai 88

Input Output Ground Truth

I2DL: Prof. Niessner, Dr. Dai

Data from maps.google.com

89

[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

BW → ColorInput Output Input Output Input Output

I2DL: Prof. Niessner, Dr. Dai 90

Data from ImageNet [Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks

GAN Applications

91I2DL: Prof. Niessner, Dr. Dai

BigGAN: HD Image Generation

92I2DL: Prof. Niessner, Dr. Dai

[Brock et al., ICLR‘18] BigGAN : Large Scale GAN Training for High Fidelity Natural Image Synthesis

StyleGAN: Face Image Generation

93I2DL: Prof. Niessner, Dr. Dai

[Karras et al., ‘18] StyleGAN : A Style-Based Generator Architecture for Generative Adversarial Networks [Karras et al., ‘19] StyleGAN2 : Analyzing and Improving the Image Quality of StyleGAN

Cycle GAN: Unpaired Image-to-Image Translation

I2DL: Prof. Niessner, Dr. Dai[Zhu et al., ICCV‘17] Cycle GAN : Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

94

SPADE: GAN-Based Image Editing

95I2DL: Prof. Niessner, Dr. Dai[Park et al., CVPR‘19] SPADE : Semantic Image Synthesis with Spatially-Adaptive Normalization

Few-Shot-Vid2Vid: Single-Shot Video Generation

96I2DL: Prof. Niessner, Dr. Dai

• https://www.youtube.com/watch?v=APoB1u3kTOU• https://www.youtube.com/watch?v=kkA6CHRovKA

[Wang et al., NeurIPS‘19] Few-Shot Video-to-Video Synthesis

References for Further Reading• https://towardsdatascience.com/intuitively-

understanding-variational-autoencoders-1bfe67eb5daf

• https://phillipi.github.io/pix2pix/

• http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf

I2DL: Prof. Niessner, Dr. Dai 97

Next Lecture

I2DL: Prof. Niessner, Dr. Dai 98

• Next lecture on 28th January: - Course outlook

• Reminder: Exercise 3 due tomorrow at 18:00

• Thursday exercise session on exercise 3 and exam tips

See you next time!

99I2DL: Prof. Niessner, Dr. Dai

top related