generative adversarial networks (gans) - emanuele sansone · list of recent papers on generative...

Emanuele Sansone

Generative Adversarial Networks (GANs)

29 march 2017Website: emsansone.github.io

Stimulate discussion

1. Why not using this model? 2. What don’t you like in this model? 3. Why not improving it? 4. …

Sketch of GANs

Density estimation in high-dimensional data (manifold assumption)

Ian Goodfellow, NIPS 2016 tutorial on GANs

Yoshua Bengio, MLSS 2015 Austin TX

Sketch of GANs

Training Datag✓(·)g

D�(·)D

[0, 1]

“Generative Adversarial Networks” ICLR 2014 Ian Goodfellow

D�(·)D

[0, 1]

Sketch of GANs

1. Fast sample generation - discriminative function (no Markov chains)

2. Impl ic i t defin i t ion o f density families

GENERATIVE

D�(·)D

[0, 1]

Sketch of GANs

ADVERSARIAL

SEE LATER

Game between two adversaries (no likelihood maximization)

D�(·)D

[0, 1]

Sketch of GANs

NETWORKS

“Learning Deep Architectures for AI” Joshua Bengio 2009

Exploitation of deep neural architectures

(High capacity and efficient training…)

“Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” arXiv 2015 Alec Radford, Luke Metz, Soumith Chintala

Application - Image synthesis

Application - Video synthesis

“Generating Videos with Scene Dynamics” NIPS 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Beach Golf Train Baby

Hallucinated videos

Input Output Input Output Input Output

Conditional generation of videos

Application - Representation learning 1

“InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets” NIPS 2016 Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

Application - Representation Learning 2

“InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets” NIPS 2016 Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

List of recent papers on Generative Adversarial Networks:

PAPERs about GANs 20142016

Generative Adversarial Networks ICLR 2014

Deep Generative Image Models Using a Laplacian Pyramid of Adversarial

Networks (UNSUPERVISED)

NIPS 2015

Draw: A Recurrent Neural Network for Image Generation (UNSUPERVISED) ICML 2015

Unsupervised Representation Learning with Deep Convolutional Generative

Adversarial Networks

arXiv 2015

InfoGAN: Interpretable Representation Learning by Information Maximizing

Generative Adversarial Nets

NIPS 2016

Towards Principled Unsupervised Learning ICLR 2016 (workshop)

Improved Techniques for Training GANs NIPS 2016

Unsupervised and SemiSupervised Learning with Categorical Generative

Adversarial Networks

ICLR 2016

Generating Videos with Scene Dynamics NIPS 2016

NIPS 2016 Tutorial: Generative Adversarial Networks PAPER+VIDEO NIPS 2016

PAPERs about GANs accepted to ICLR 2017

Towards Principled Methods for Training Generative Adversarial Networks ( FIRST THEORETICAL PAPER)

ICLR 2017

Adversarially Learned Inference ( SoA in SEMISUPERVISED LEARNING ) ICLR 2017

Learning to Generate Samples from Noise Through Infusion Training ICLR 2017

Improving Generative Adversarial Networks with Denoising Feature Matching ICLR 2017

LRGAN: Layered Recursive Generative Adversarial Networks for Image

Generation

ICLR 2017

Mode Regularized Generative Adversarial Networks ICLR 2017

Generative Models and Model Criticism via Optimized Maximum Mean

Discrepancy

ICLR 2017

Calibrating EnergyBased Generative Adversarial Networks ICLR 2017

Unrolled Generative Adversarial Networks ICLR 2017

Generative MultiAdversarial Networks ICLR 2017

EnergyBased Generative Adversarial Networks ICLR 2017

Probably hot topic?

Emanuele Sansone

Generative Adversarial Networks (GANs)

29 march 2017Website: emsansone.github.io

Problem formulation

x⇠px

z⇠pz

log(1�D�

Problem formulation

x⇠px

z⇠pz

log(1�D�

Training Data Generated Data

Problem formulation

x⇠px

z⇠pz

log(1�D�

Optimal Discriminator (Proof sketch)

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

a log(y) + b log(1� y)

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

(x) log(D

(x)) + p

(x) log(1�D�

8(a, b) 2 R2 \ {0, 0} @a log(y) + b log(1� y)@y

= 0() y ⇤ =a

⇤�

(x) =p

(x) + pg

Problem formulation

x⇠px

z⇠pz

log(1�D�

Optimal Generator (Proof sketch)

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

The minimum value is �2log(2) when JSD(px

) = 0 () p

(x) = px

(x),8x

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

) = 0 () p

(x) = px

(x),8x

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

) = 0 () p

(x) = px

(x),8x

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

) = 0 () p

(x) = px

(x),8x

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

) = 0 () p

(x) = px

(x),8x

Problem formulation

x⇠px

z⇠pz

log(1�D�

x⇠px

x⇠pg

log(1�D�

(x) log(D

(x))dx +

(x) log(1�D�

(x))dx

⇤�

(x) =p

(x) + pg

(x) log

(x) + p

◆dx +

(x) log

✓1�

(x) + p

(x) log

(x) + p

◆dx +

(x) log

(x) + p

(x) log

(x)+pg

◆dx +

(x) log

(x)+pg

◆dx � 2 log(2)

2JSD(px

) = 0 () p

(x) = px

(x),8x

In pratice

D�(·)D

(x) =p

(x) + pg

In pratice

DON’T WORRY, THERE IS A DEMO

Why is it better than traditional generative approaches?

Traditional generative approaches

ln p(DATA) = lnZp(DATA, ✓)d✓

Zq(✓)p(DATA, ✓)q(✓)d✓

�Zq(✓) ln

p(DATA, ✓)q(✓)d✓

.= KL(q(✓), p(DATA, ✓))

The maximum is achieved for q(✓) = p(✓|DATA)

q(✓)

p(✓|DATA)

can be regarded as pgcan be regarded as p

�Zq(✓) ln

.= KL(q(✓), p(DATA, ✓))

q(✓)

p(✓|DATA)

�Zq(✓) ln

.= KL(q(✓), p(DATA, ✓))

q(✓)

p(✓|DATA)

q(✓)

p(✓|DATA)

�Zq(✓) ln

.= �KL(q(✓), p(DATA))

q(✓) can be regarded as pgcan be regarded as p

maxKL(pg

�Zq(✓) ln

.= �KL(q(✓), p(DATA))

p(✓,DATA)

q(✓) can be regarded as pgcan be regarded as p

�Zq(✓) ln

.= �KL(q(✓), p(DATA))

minKL(pg

p(✓,DATA)

Traditional generative approaches vs. GANS

Traditional Generative Approaches GANs

Optimization

0 (Low cost for mode dropping) log(2)

Infinity (High cost for fake data) log(2)

Minimum 0 0

minKL(pg

) min JSD(pg

Is there any issue?

Problem of vanishing gradients

“Towards Principled Methods for Training Generative Adversarial Networks” ICLR 2017 Martin Arjovsky, Leon Bottou

Theorem 2.4: if the discriminator is close to optimality, namely

(x) 'p

(x) + pg

In other words,

(which is a very common case, i.e. Theorem 2.1-2.2)and the Jacobian of the generator is bounded

(by any scalar )

kr✓Ez⇠pzn

log(1�D�(g✓(z)))o

k2 M✏

1� ✏

(x) ' 1,8x 2 supp(px

(x) ' 0,8x 2 supp(pg

Problem of vanishing gradients

Theorem 2.4: if the discriminator is close to optimality, namely

(x) 'p

(x) + pg

(which is a very common case, i.e. Theorem 2.1-2.2)and the Jacobian of the generator is bounded

(by any scalar )

kr✓Ez⇠pzn

k2 M✏

1� ✏

This happens when the discriminator is “too strong”. For example, when it is updated more

frequently than the generator

In other words,

(x) ' 1,8x 2 supp(px

(x) ' 0,8x 2 supp(pg

Problem of vanishing gradientsProof: kr

Ez⇠p

log(1�D�

(z)))o

k22 = kEz⇠pzn

log(1�D�

(z)))o

= kEz⇠p

�r✓

1�D�

Ez⇠p

1�D�

(z))k22o

= Ez⇠p

nkr✓

(z))k22|1�D

(z))|2o

= Ez⇠p

(z))J(g✓

(z))k22|1�D

(z))|2o

Ez⇠p

(z))k22kJ(g✓(z))k22|1�D

(z))|2o

Since the discriminator is close to optimality, namelykD�

�D⇤�

k .= supx2X

�D⇤�

|+ krx

D⇤�

(values and gradients are both similar) and since

D⇤�

k22 � krxD�k22 � krxD⇤�k22

k22 < krxD⇤�k22 + ✏2 Cont.

Cauchy-Schwartz inequality

Jensen’s inequality

Problem of vanishing gradientsProof:

|D� �D⇤�| < ✏|� 1 +D� + 1�D⇤�| < ✏|1�D⇤� � (1�D�)| < ✏|1�D⇤�|� |(1�D�)| |1�D⇤� � (1�D�)||1�D⇤�|� |(1�D�)| < ✏

Ez⇠p

(z))k22kJ(g✓(z))k22|1�D

(z))|2o

< Ez⇠p

D⇤�

(z))k22k+ ✏2)J(g✓(z))k22|1�D

(z))|2o

< Ez⇠p

D⇤�

(z))k22k+ ✏2)J(g✓(z))k22(|1�D⇤

(z))|� ✏)2o

At optimalityrx

⇤�

(x) = 0

⇤�

(x) = 0,8x 2 supp(pg

Therefore

kr✓Ez⇠pzn

k22 < Ez⇠pzn✏2kJ(g✓(z))k2

(1� ✏)2o

M2✏2

(1� ✏)2 QED

How do you solve it?

D�(·)D

[0, 1]

Recent Solution

✏g✓(z) + ✏ x + ✏

Add zero-mean noise

Recent SolutionTheorem 3.2:

⇤�

(x) =p

x+✏(x)

x+✏(x) + pg+✏(x)

✏ ⇠ N (0,�2I)

Ez⇠p

log(1�D⇤�

(z)))o

= Ez⇠p

� y)r✓

� yk2px

� y)r✓

� yk2pg

(y)dyo

b(z) = a(z)px+✏

pg+✏

Recent SolutionTheorem 3.2:

⇤�

(x) =p

x+✏(x)

x+✏(x) + pg+✏(x)

✏ ⇠ N (0,�2I)

“Towards Principled Methods for Training Generative Adversarial Networks” ICLR 2017 Martin Arjovsky, Leon Both

b(z) = a(z)px+✏

pg+✏

Ez⇠p

log(1�D⇤�

(z)))o

= Ez⇠p

(z)� y)r✓

(z)� yk2px

�b(z)Z

(z)� y)r✓

(z)� yk2pg

(y)dyo

Recent Solution

Proof:

a(z).=1

2�21

px+✏(g✓(z)) + pg+✏(g✓(z))

b(z).=1

2�21

px+✏(g✓(z)) + pg+✏(g✓(z))

px+✏(g✓(z))

pg+✏(g✓(z))

Ez⇠p

log(1�D⇤�

(z)))o

= Ez⇠p

pg+✏(g✓(z))

px+✏(g✓(z)) + pg+✏(g✓(z))

= Ez⇠p

log(pg+✏(g✓(z)))�r✓ log(px+✏(g✓(z)) + pg+✏(g✓(z)))

= Ez⇠p

pg+✏(g✓(z))

�r✓

px+✏(g✓(z)) +r✓pg+✏(g✓(z))px+✏(g✓(z)) + pg+✏(g✓(z))

= Ez⇠p

{�px+✏(g✓(z))}�

px+✏(g✓(z)) + pg+✏(g✓(z))

px+✏(g✓(z))

pg+✏(g✓(z))

{�pg+✏(g✓(z))}

= Ez⇠p

2�2a(z)r✓

{�px+✏(g✓(z))}� 2�2b(z)r✓{�pg+✏(g✓(z))}

Recall that adding two independent random variables produces a random variable with density obtained by convolving the two original densities

Proof:

Recent Solution

(it should be derived)!It comes from the

optimal discriminatorformula which is

assumed fixed (andtherefore not

dependent on \theta)

Ez⇠p

log(1�D⇤�

(z)))o

= Ez⇠p

� 2�2a(z)r✓

(z)� y)px

(y)dy+

2�2b(z)r✓

(z)� y)pg

(y)dyo

= Ez⇠p

� 2�2a(z)r✓

(z)�yk2

2�2 px

(y)dy+

2�2b(z)r✓

(z)�yk2

2�2 pg

(y)dyo

= Ez⇠p

� 2�2a(z)Z

(z)�yk2

2�2 px

(y)dy+

2�2b(z)

(z)�yk2

2�2 pg

(y)dyo

= Ez⇠p

(z)�yk2

2�2 r✓

(z)� yk2px

(y)dy�

(z)�yk2

2�2 r✓

(z)� yk2pg

(y)dyo

= Ez⇠p

(z)� y)r✓

(z)� yk2px

(y)dy�

(z)� y)r✓

(z)� yk2pg

(y)dyo

Why does it work?

InterpretationEz⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

(y)dyZp✏(g✓(z)� y)r✓kg✓(z)� yk2pg(y)dy

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

� >> 1, p✏ ' kAssumption:

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

kg✓(z)� yk2

r✓kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

�r✓kg✓(z)� yk2

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

The overall effect is to move generated points close to the data manifold (ATTRACTION TO HIGH DENSITY)

Interpretation

g✓(z)

kg✓(z)� yk2

Assumption:

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

VANISHING GRADIENTS

� ! 0, p✏(g✓(z)� y) ' 0

Interpretation

g✓(z)

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

r✓kg✓(z)� yk2

Interpretation

g✓(z)

kg✓(z)� yk2

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

r✓kg✓(z)� yk2

The overall effect is to stretch the generated manifold (STRETCHING HIGH DENSITY REGIONS)

Interpretation

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

b(z) = a(z)px+✏

pg+✏

Interpretation

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

b(z) = a(z)px+✏

pg+✏

a(z) >> b(z)pg p

Attraction

g✓(z) g✓(z)

Interpretation

Ez⇠pz

r✓ log(1�D⇤�(g✓(z)))o

= Ez⇠pz

�b(z)

(z)� y)r✓

(z)� yk2px

b(z) = a(z)px+✏

pg+✏

a(z) >> b(z)pg p

Attraction

pga(z) ⇠ b(z)

Stretching

g✓(z) g✓(z)

Interpretation

Training Data

[0, 1]

✓1✓2

�1�2

�(⌃(·))

Normal: GAMES =1500 DISCRIMINATOR_STEPS = 1 GENERATOR_STEPS = 1

Strong discriminator: GAMES = 100 DISCRIMINATOR_STEPS = 600 GENERATOR_STEPS = 1

Strong generator: GAMES = 10 DISCRIMINATOR_STEPS = 1 GENERATOR_STEPS = 600

Open issues

1. Other solutions to the vanishing gradient problem? 2. Problem when the dimension of the support of the input distribution

is lower than the dimension of the support of the data distribution

Before training

After training

g✓(·)g

dim(supp(pz)) � dim(supp(pg))

Open issues

g✓(·)g

Open issues

Before training

After training

g✓(·)g

dim(supp(px

)) > dim(supp(pg

generative adversarial networks (gans) - emanuele sansone · list of recent papers on generative...

Documents

a beginner's guide to generative adversarial networks...

generative adversarial networks (gans) in networking: a

generative adversarial networks for single …...generative...

lipschitz constrained gans via boundedness and...

generative adversarial networks (gans) · 2020. 9. 6. ·...

gan d v u enerative adversarial...

generative adversarial networks (gans) · where d is our...

machine learning (cse 446): generative adversarial ... ·...

semantic segmentation using adversarial networkstheir...

gans in action: deep learning with generative adversarial...

feedback adversarial learning: spatial feedback for...

generative adversarial networks (gans) past,...

gan d v u enerative adversarial...

generative adversarial networksraul/rs/gans.pdf ·...

machine learning (cse 446): generative adversarial networks...

progressive augmentation of gans · anna.khoreva@bosch.com...

generative adversarial nets(gans) -...

generative adversarial networks (gans) - ian goodfellotions....

generative adversarial networks - tj machine...

generative adversarial...