generative adversarial networks

107
Generative Adversarial Networks ( GAN ) The coolest idea in ML in the last twenty years - Yann Lecun 2017.02 Namju Kim ([email protected])

Upload: -

Post on 17-Jan-2017

2.491 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Generative adversarial networks

Generative Adversarial Networks ( GAN )The coolest idea in ML in the last twenty years - Yann Lecun

2017.02 Namju Kim ([email protected])

Page 2: Generative adversarial networks

Introduction

Page 3: Generative adversarial networks

Taxonomy of ML

Introduction

From David silver, Reinforcement learning (UCL course on RL, 2015).

Page 4: Generative adversarial networks

Supervised Learning

• Find deterministic function f : y = f(x), x : data, y : label

Introduction

Catf F18f

Page 5: Generative adversarial networks

Supervised Learning• How to implement the following function prototype.

Introduction

Page 6: Generative adversarial networks

Supervised Learning• Challenges

– Image is high dimensional data

• 224 x 224 x 3 = 150,528

– Many variations

• viewpoint, illumination, deformation, occlusion,

background clutter, intraclass variation

Introduction

Page 7: Generative adversarial networks

Supervised Learning• Solution

– Feature Vector

• 150,528 → 2,048

– Synonyms

• Latent Vector, Hidden Vector, Unobservable Vector,

Feature, Representation

Introduction

Page 8: Generative adversarial networks

Supervised Learning

Introduction

Page 9: Generative adversarial networks

Supervised Learning• More flexible solution

– Get probability of the label for given data instead of label

itself

Introduction

Cat : 0.98Cake : 0.02Dog : 0.00

f

Page 10: Generative adversarial networks

Supervised Learning

Introduction

Page 11: Generative adversarial networks

Supervised Learning• Mathematical notation of optimization

– y : label, x : data, z : latent, : learnable parameter

Introduction

given parameterized byprobabilityget when P is maximum

Optimal

Page 12: Generative adversarial networks

Supervised Learning• Mathematical notation of classifying ( greedy policy )

– y : label, x : data, z : latent, : fixed optimal parameter

Introduction

given parameterized byprobabilityget y when P is maximum

Optimallabel

prediction

Page 13: Generative adversarial networks

Unsupervised Learning

• Find deterministic function f : z = f(x), x : data, z : latent

Introduction

[0.1, 0.3, -0.8, 0.4, … ]f

Page 14: Generative adversarial networks

Good features• Less redundancy

• Similar features for similar data

• High fidelity

From Yann Lecun, Predictive Learning (NIPS tutorial, 2016).

RL ( cherry )

SL ( icing )

UL ( cake )

Good Bad

Introduction

Page 15: Generative adversarial networks

Unsupervised Learning

Introduction

From Ian Goodfellow, Deep Learning (MIT press, 2016).

E2E(end-to-end)

Page 16: Generative adversarial networks

Unsupervised Learning• More challenging than supervised learning :

– No label or curriculum → self learning

• Some NN solutions :

– Boltzmann machine

– Auto-encoder or Variational Inference

– Generative Adversarial Network

Introduction

Page 17: Generative adversarial networks

Generative Model

• Find generation function g : x = g(z), x : data, z : latent

Introduction

[0.1, 0.3, -0.8, 0.4, … ] g

Page 18: Generative adversarial networks

Unsupervised learning vs. Generative model• z = f(x) vs. x = g(z)

• P(z|x) vs. P(x|z)

• Encoder vs. Decoder ( Generator )– P(x, z) needed. ( cf : P(y|x) in supervised learning )

• P(z|x) = P(x, z) / P(x) → P(x) is intractable ( ELBO )

• P(x|z) = P(x, z) / P(z) → P(z) is given. ( prior )

Introduction

Page 19: Generative adversarial networks

Autoencoders

Page 20: Generative adversarial networks

Stacked autoencoder - SAE• Use data itself as label → Convert UL into reconstruction SL

• z = f(x), x=g(z) → x = g(f(x)) • https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_sae.py

Autoencoders

f(enc)

g(dec)

x x’z

Supervised learningwith L2 loss ( = MSE )

Page 21: Generative adversarial networks

Denoising autoencoder - DAE• Almost same as SAE

• Add random noise to input data → Convert UL into denoising SL• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_dae.py

Autoencoders

f(enc)

g(dec)

x + noise x’z

Supervised learningwith L2 loss ( = MSE )

Page 22: Generative adversarial networks

Variational autoencoder - VAE• Kingma et al, “Auto-Encoding Variational Bayes”, 2013.

• Generative Model + Stacked Autoencoder

– Based on Variational approximation

– other approaches :

• Point estimation ( MAP )

• Markov Chain Monte Carlo ( MCMC )

Autoencoders

Page 23: Generative adversarial networks

Variational autoencoder - VAE• Training• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_vae.py

Autoencoders

f(enc)

g(dec)

x x’z :

Supervised learningwith L2 loss ( = MSE ) +

KL regularizer

z +

Page 24: Generative adversarial networks

Variational autoencoder - VAE• Generating• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_vae_eval.py

Autoencoders

g(dec)

z :

Page 25: Generative adversarial networks

Variational autoencoder - VAE• Question :

• Solution :

– p(z|x) is intractable but MCMC is slow.

– Introduce auxiliary variational function q(z|x)

– Approximate q(z|x) to p(z|x) by ELBO

– Make q(z|x)

Autoencoders

z = z → p(z|x) = +

Page 26: Generative adversarial networks

Variational autoencoder - VAE• Simply : → 0, → 1

• KLD Regularizer :

– Gaussian case :

• -

• If we fix = 1, ( Practical approach )

– → MSE ( )

Autoencoders

Page 27: Generative adversarial networks

Variational autoencoder - VAE• KLD ( Kullback-Leibler divergence )

• A sort of distribution difference measure – cf : KS (Kolmogorov-smirov) test, JSD(Jensen-shannon Divergence)

Autoencoders

Page 28: Generative adversarial networks

Variational autoencoder - VAE• Find problems in the following code

Autoencoders

Page 29: Generative adversarial networks

Variational autoencoder - VAE• Reparameterization trick

– Enable back propagation

– Reduce variances of gradients

Autoencoders

f(enc)

g(dec)

z :

z +f(enc)

g(dec)

z :

Page 30: Generative adversarial networks

Variational autoencoder - VAE• This is called ‘Reparameterization trick’

Autoencoders

Page 31: Generative adversarial networks

Variational autoencoder - VAE• Results

Autoencoders

Page 32: Generative adversarial networks

Generative Adversarial Networks

Page 33: Generative adversarial networks

Generative Adversarial Networks - GAN

• Ian Goodfellow et al, “Generative Adversarial Networks”, 2014.

– Learnable cost function

– Mini-Max game based on Nash Equilibrium

• Little assumption

• High fidelity

– Hard to training - no guarantee to equilibrium.

GAN

Page 34: Generative adversarial networks

Generative Adversarial Networks - GAN• Alternate training• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_gan.py

GAN

G(generator)

z :

D(discriminator)

real image

fake image

1(Real)

0(fake)

1(real)

Discriminator trainingGenerator training

Page 35: Generative adversarial networks

Generative Adversarial Networks - GAN• Generating• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_gan_eval.py

GAN

G(generator)

z :

Page 36: Generative adversarial networks

Generative Adversarial Networks - GAN• Mathematical notation

GAN

x is sampled from real data

z is sampled from N(0, I)

Expectation prob. of D(real) prob. of D(fake)

Maximize DMinimize G

Value of

fake

Page 37: Generative adversarial networks

Generative Adversarial Networks - GAN• Mathematical notation - discriminator

GAN

Maximize prob. of D(real) Minimize prob. of D(fake)

BCE(binary cross entropy) with label 1 for real, 0 for fake.( Practically, CE will be OK. or more plausible. )

Page 38: Generative adversarial networks

Generative Adversarial Networks - GAN• Mathematical notation - generator

GAN

Maximize prob. of D(fake)

BCE(binary cross entropy) with label 1 for fake.( Practically, CE will be OK. or more plausible. )

Page 39: Generative adversarial networks

Generative Adversarial Networks - GAN• Mathematical notation - equilibrium

GAN

Jansen-Shannon divergence

0.5

Page 40: Generative adversarial networks

Generative Adversarial Networks - GAN

GAN

Page 41: Generative adversarial networks

Generative Adversarial Networks - GAN• Result

GAN

Page 42: Generative adversarial networks

Generative Adversarial Networks - GAN• Why blurry and why sharper ?

GAN

From Ian Goodfellow, Deep Learning (MIT press, 2016) From Christian et al, Photo-realistic single image super-resolution using generative adversarial networks, 2016

Page 43: Generative adversarial networks

Training GANs

Page 44: Generative adversarial networks

Training GANs

Pitfall of GAN• No guarantee to equilibrium

– Mode collapsing

– Oscillation

– No indicator when to finish

• All generative model– Evaluation metrics ( Human turing test )

– Overfitting test ( Nearest Neighbor ) → GAN is robust !!!

– Diversity test

– Wu et al, On the quantitative analysis of decoder-based generative model, 2016

Page 45: Generative adversarial networks

DCGAN• Radford et al, Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks, 2015

– Tricks for gradient flow

• Max pooling → Strided convolution or average pooling

• Use LeakyReLU instead of ReLU

– Other tricks

• Use batch normal both generator and discriminator

• Use Adam optimizer ( lr = 0.0002, a = 0.9, b=0.5 )

Training GANs

Page 46: Generative adversarial networks

DCGAN• Results

Training GAN

Page 47: Generative adversarial networks

DCGAN• Latent vector arithmetic

Training GAN

Page 48: Generative adversarial networks

Pitfall of GAN

Training GANs

mode collapsing oscillating

Page 49: Generative adversarial networks

Escaping mode collapsing

• Minibatch discrimination ( Salimans et al, 2016 ) ← slow

• Replay memory ( Shrivastava et al, 2016 ) ← controversial

• Pull-away term regularizer ( Zhao et al, 2016 ) ← not so good

• Unrolled GAN ( Metz et al, 2016 ) ← not so good

Training GANs

Page 50: Generative adversarial networks

Escaping mode collapsing

• Discriminator ensemble ( Durugkar, 2016) ← not tested

• Latent-Image pair discrimination ( Donahue et al, 2016,

Dumoulin et al, 2016 ) ← Good ( additional computing time )

• InfoGAN ( Chen et al, 2016 ) ← Good

Training GANs

Page 51: Generative adversarial networks

Escaping mode collapsing

Training GANs

Page 52: Generative adversarial networks

Other tricks• Salimans et al, Improved Techniques for Training GANs, 2016

• https://github.com/soumith/ganhacks

– One-sided label smoothing ← not sure

– Historical averaging ← Unrolled GAN is better

– Feature matching ← working

– Use noise or dropout into real/fake image ← working

– Don’t balance D and G loss ← I agree

Training GANs

Page 53: Generative adversarial networks

Batch normalization in GAN

• Use in G but cautious in D

• Do not use at last layer of G

and first layer of D

• IMHO, don’t use BN in D

• Reference BN or Virtual BN

( Not sure )

Training GANs

Page 54: Generative adversarial networks

Variants of GANs

Page 55: Generative adversarial networks

Taxonomy of Generative Models

Variants of GANs

From Ian Goodfellow, NIPS 2016 Tutorial : Generative Adversarial Netorks, 2016

SOTA !!!AR based model :

PixelRNNWaveNet

Page 56: Generative adversarial networks

Variants of GANs

From Ordena et al, Conditional Image Synthesis With Auxiliary Classifier GANs, 2016

SOTA with LabelSOTA without Label

Taxonomy of GANs

Page 57: Generative adversarial networks

AFL, ALI• Donahue et al, Adversarial Feature Learning, 2016

• Domoulin et al, Adversarial Learned Inference, 2016

Variants of GANs

Page 58: Generative adversarial networks

AFL, ALI• Results

Variants of GANs

Page 59: Generative adversarial networks

InfoGAN• Chen et al, InfoGAN : Interpretable Representation Learning by

Information Maximizing Generative Adversarial Nets, 2016

Variants of GANs

Page 60: Generative adversarial networks

InfoGAN• Results

• My implementation : https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_info_gan.py

Variants of GANs

Page 61: Generative adversarial networks

EBGAN• Junbo et al, Energy-based generative adversarial networks,

2016

Variants of GANs

Page 62: Generative adversarial networks

EBGAN• Results

Variants of GANs

Page 63: Generative adversarial networks

EBGAN• My implementation : https://github.com/buriburisuri/ebgan

Variants of GANs

Page 64: Generative adversarial networks

ACGAN• Ordena et al, Conditional Image Synthesis with Auxiliary

Classifier GANs, 2016

Variants of GANs

Page 65: Generative adversarial networks

ACGAN• Results

Variants of GANs

Page 66: Generative adversarial networks

ACGAN• My implementation : https://github.com/buriburisuri/ac-gan

Variants of GANs

Page 67: Generative adversarial networks

WGAN• Martin et al, Wasserstein GAN, 2017

Variants of GANs

G(generator)z :

C(critic)

real image

fake image

High valueLow valueHigh value

Critic trainingGenerator training

Weight Clamping ( No BCE )

Page 68: Generative adversarial networks

WGAN• Martin et al, Wasserstein GAN, 2017

– JSD → Earth Mover Distance(=Wasserstein-1 distance)

– Prevent the gradient vanishing by using weak distance metrics

– Provide the parsimonious training indicator.

Variants of GANs

Page 69: Generative adversarial networks

WGAN• Results

Variants of GANs

Page 70: Generative adversarial networks

Application of GANs

Page 71: Generative adversarial networks

Generating image• Wang et al, Generative Image Modeling using Style and

Structure Adversarial Networks, 2016

Application of GANs

Page 72: Generative adversarial networks

Generating image• Wang et al : Results

Application of GANs

Page 73: Generative adversarial networks

Generating image• Jamonglab, 128x96 face image generation, 2016.11

Application of GANs

Page 74: Generative adversarial networks

Generating image• Two-step generation : Sketch → Color

• Binomial random seed instead of gaussian

Application of GANs

Page 75: Generative adversarial networks

Generating image• Zhang et al, StackGAN : Text to Photo-realistic Image Synthesis

with Stacked GANs, 2016

Application of GANs

Page 76: Generative adversarial networks

Generating image• Zhang et al : Results

Application of GANs

Page 77: Generative adversarial networks

Generating image• Reed et al, Learning What and Where to Draw, 2016

Application of GANs

Page 78: Generative adversarial networks

Generating image• Reed et al : Results

Application of GANs

Page 79: Generative adversarial networks

Generating image• Nguyen et al, Plug & Play Generative Networks, 2016

– Complex langevin sampling via DAE

Application of GANs

Page 80: Generative adversarial networks

Generating image• Arici et al, Associative Adversarial Networks, 2016

Application of GANs

p(z|x) =

Page 81: Generative adversarial networks

Generating image• Arici et al : Results

Application of GANs

Page 82: Generative adversarial networks

Translating image• Perarnau et al, Invertible Conditional GANs for image editing,

2016

Application of GANs

Page 83: Generative adversarial networks

Translating image• Perarnau et al : results

Application of GANs

Page 84: Generative adversarial networks

Translating image• Isola et al, Image-to-image translation with convolutional

adversarial networks, 2016

Application of GANs

Page 85: Generative adversarial networks

Translating image• Isola et al : Results

Application of GANs

Page 86: Generative adversarial networks

Translating image• Ledig et al, Photo-Realistic Single Image Super-Resolution

Using a GAN, 2016

Application of GANs

Page 87: Generative adversarial networks

Translating image• Ledig et al : Results

Application of GANs

Page 88: Generative adversarial networks

Translating image• Sajjadi et al, EnhanceNet : Single Image Super-Resolution

through Automated Texture Synthesis, 2016

– SRGAN + Gram metrics loss

Application of GANs

Page 89: Generative adversarial networks

Domain adaptation• Taigman et al, Unsupervised cross-domain image generation,

2016

Application of GANs

Page 90: Generative adversarial networks

Domain adaptation• Taigman et al, Unsupervised cross-domain image generation,

2016

Application of GANs

Page 91: Generative adversarial networks

Domain adaptation• Bousmalis et al, Unsupervised Pixel-Level Domain Adaptation

with Generative Adversarial Networks, 2016

Application of GANs

Page 92: Generative adversarial networks

Domain adaptation• Bousmalis et al : Results

Application of GANs

Page 93: Generative adversarial networks

Domain adaptation• Shrivastava et al, Learning from simulated and unsupervised

images through adversarial training, 2016

Application of GANs

Page 94: Generative adversarial networks

Domain adaptation• Shrivastava et al : Results

Application of GANs

Page 95: Generative adversarial networks

Unsupervised clustering• Jamonglab : https://github.com/buriburisuri/timeseries_gan

– Using InfoGAN

Application of GANs

Real Fake

Page 96: Generative adversarial networks

Unsupervised clustering

Application of GANs

Page 97: Generative adversarial networks

Text generation• Yu et al, SeqGAN : Sequence Generative Adversarial Nets with

Policy Gradient, 2016

Application of GANs

Page 98: Generative adversarial networks

Text generation• Maddison et al, The concrete distribution : A continous

relaxation of discrete random variables, 2016

• Kusner et al, GANs for sequences of discrete elements with the

Gumbel-softmax distribution, 2016

Application of GANs

Page 99: Generative adversarial networks

Text generation• Zhang et al, Generating text via Adversarial Training, 2016

Application of GANs

Page 100: Generative adversarial networks

Imitation learning ( IRL or Optimal control )• Ho et al, Model-free imitation learning with policy optimization, 2016

• Finn et al, A connection between generative adversarial networks,

Inverse Reinforcement Learning, and Energy-based models, 2016

Application of GANs

Page 101: Generative adversarial networks

Future of GANs

Page 102: Generative adversarial networks

Pre-trained unified features• Unified features by unsupervised learning

• Pre-trained and shipped on mobile chip as default

Future of GANs

E(pre-trained

encoder)

zx C

(classifier)

1(Good)

0(Bad)

Page 103: Generative adversarial networks

Personalization at scale• Environment ( User ) simulator

Future of GANs

Page 104: Generative adversarial networks

Personalization at scale• Intelligent agent using RL

Future of GANs

Page 105: Generative adversarial networks

Personalization at scale• For example, cardiac simulator using GAN

Future of GANs

Page 106: Generative adversarial networks

Conclusion• Unsupervised learning is next frontier in AI

– Same algos, multiple domains

• Creativity and experience are no longer human temperament

Towards Artificial General Intelligence

Future of GANs