lecture 7: generative adversarial networksuvadlc.github.io/lectures/apr2019/lecture7-gan.pdfuva deep...
TRANSCRIPT
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 1
Lecture 7: Generative Adversarial NetworksEfstratios Gavves
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 2
oGentle intro to generative models
oGenerative Adversarial Networks
oVariants of Generative Adversarial Networks
Lecture overview
Generative models
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 4
oGenerative modellingโฆLearn the joint pdf: ๐(๐ฅ, ๐ฆ)
โฆModel the world Perform tasks, e.g. use Bayes rule to classify: ๐(๐ฆ|๐ฅ)
โฆNaรฏve Bayes, Variational Autoencoders, GANs
oDiscriminative modellingโฆLearn the conditional pdf: ๐(๐ฆ|๐ฅ)
โฆTask-oriented
โฆE.g., Logistic Regression, SVM
Types of Learning
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 5
oWhat to pick?โฆV. Vapnik: โOne should solve the [classification] problem directly and never solve a more general [and harder] problem as an intermediate step.โ
oTypically, discriminative models are selected to do the job
oGenerative models give us more theoretical guarantees that the model is going to work as intendedโฆBetter generalization
โฆLess overfitting
โฆBetter modelling of causal relationships
Types of Learning
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 6
Applications of generative modeling?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 7
oAct as a regularizer in discriminative learningโฆDiscriminative learning often too goal-oriented
โฆOverfitting to the observations
oSemi-supervised learningโฆMissing data
oSimulating โpossible futuresโ for Reinforcement Learning
oData-driven generation/sampling/simulation
Applications of generative modeling?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 8
Applications: Image Generation
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 9
Applications: Super-resolution
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 10
Applications: Cross-model translation
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 11
A map of generative models
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 12
Explicit density models
oPlug in the model density function to likelihood
oThen maximize the likelihood
Problems
oModes must be complex enoughto match data complexity
oAlso, model must becomputationally tractable
oMore details in the next lectures
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 13
o Density estimation
Generative modeling: Case I
Train set Fitted model
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 14
Implicit density models
oNo explicit probability density function (pdf) needed
o Instead, a sampling mechanism to draw samplesfrom the pdf without knowing the pdf
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 15
Implicit density models: GANs
oSample data in parallel
oFew restrictions on generator model
oNo Markov Chains needed
oNo variational bounds
oBetter qualitative examplesโฆWeak but true
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 16
o Sample Generation
Generative modeling: Case II
Train examples
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 17
o Sample Generation
Generative modeling: Case II
Train examples New samples (ideally)
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 18
oGenerativeโฆYou can sample novel input samples
โฆE.g., you can literally โcreateโ images that never existed
oAdversarialโฆOur generative model ๐บ learns adversarially, by fooling an discriminative oracle model D
oNetworkโฆ Implemented typically as a (deep) neural network
โฆEasy to incorporate new modules
โฆEasy to learn via backpropagation
What is a GAN?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 19
oAssume you have two partiesโฆPolice: wants to recognize fake money as reliably as possible
โฆCounterfeiter: wants to make as realistic fake money as possible
oThe police forces the counterfeiter to get better (and vice versa)
oSolution relates to Nash equilibrium
GAN: Intuition
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 20
GAN: Pipeline
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 21
oMust be differentiable
oNo invertibility requirement
oTrainable for any size of z
oCan make conditionally Gaussian given z, but no strict requirement
Generator network ๐ฅ = ๐บ(๐ง; ๐(G))
๐ง ๐ฅ
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 22
oThe discriminator is just a standard neural network
oThe generator looks like an inverse discriminator
Generator & Discriminator: Implementation
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 23
oMinimax
oMaximin
oHeuristic, non-saturating game
oMax likelihood game
Training definitions
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 24
o ๐ฝ(๐ท) = โ1
2๐ผ๐ฅ~๐๐๐๐ก๐ log ๐ท ๐ฅ โ
1
2๐ผ๐ง~๐๐ง log(1 โ ๐ท(๐บ(๐ง))
o๐ท ๐ฅ = 1 โ The discriminator believes that ๐ฅ is a true image
o๐ท ๐บ(๐ง) = 1 โ The discriminator believes that ๐บ(๐ง) is a true image
oEquilibrium is a saddle point of the discriminator loss
oResembles Jensen-Shannon divergence
oGenerator minimizes the log-probability of the discriminator being correct
Minimax Game
NIPS 2016 Tutorial: Generative Adversarial Networks
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 25
A reasonable loss for the generator?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 26
oFor the simple case of zero-sum game๐ฝ(๐บ) = โ๐ฝ(๐ท)
oSo, we can summarize game by
๐ ๐ D , ๐ G = โ๐ฝ ๐ท (๐ D , ๐ G )
oEasier theoretical analysis
o In practice not used when the discriminator starts to recognize fake samples, then โฆ
Minimax Game
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 27
oFor the simple case of zero-sum game๐ฝ(๐บ) = โ๐ฝ(๐ท)
oSo, we can summarize game by
๐ ๐ D , ๐ G = โ๐ฝ ๐ท (๐ D , ๐ G )
oEasier theoretical analysis
o In practice not used when the discriminator starts to recognize fake samples, the generator gradients vanish
Minimax Game
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 30
o ๐ฝ(๐ท) = โ1
2๐ผ๐ฅ~๐๐๐๐ก๐ log ๐ท ๐ฅ โ
1
2๐ผ๐ง~๐๐ง log(1 โ ๐ท(๐บ(๐ง))
o ๐ฝ(๐บ) = โ1
2๐ผ๐ง~๐๐ง log(๐ท(๐บ(๐ง))
oEquilibrium not any more describable by single loss
oGenerator maximizes the log-probability of the discriminator being mistakenโฆGood ๐บ(๐ง) D ๐บ ๐ง = 1 ๐ฝ(๐บ) is maximized
oHeuristically motivated; generator can still learn even when discriminator successfully rejects all generator samples
Heuristic non-saturating game
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 31
DCGAN Architecture
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 32
Examples
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 33
Even vector space arithmetics โฆ
Man
with
glasses
Man Woman
Woman with
glasses
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 34
o ๐ฝ(๐ท) = โ1
2๐ผ๐ฅ~๐๐๐๐ก๐ log ๐ท ๐ฅ โ
1
2๐ผ๐ง log(1 โ ๐ท(๐บ(๐ง))
o ๐ฝ(๐บ) = โ1
2๐ผ๐ง log(๐
โ1(๐ท ๐บ ๐ง )
oWhen discriminator is optimal, the generator gradient matches that of maximum likelihood
Modifying GANs for Max-Likelihood
On distinguishability criteria for estimating generative models
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 35
Comparison of Generator Losses
When sample is likely fake, the non-saturating heuristic and the ML cost are flat no gradients in early steps
The ML cost variant generates gradients mostly from the โgood generationsโ all gradients from few samples high variance Variance reduction?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 36
oOptimal ๐ท(๐ฅ) for any ๐๐๐๐ก๐ ๐ฅ and ๐๐๐๐๐๐ ๐ฅ is always
๐ท ๐ฅ =๐๐๐๐ก๐ ๐ฅ
๐๐๐๐ก๐ ๐ฅ + ๐๐๐๐๐๐ ๐ฅ
oEstimating this ratio with supervised learning (discriminator) is the key
Optimal discriminator
Discriminator Data
Model
distribution
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 37
o๐ฟ ๐ท, ๐บ = ๐ฅ ๐๐ ๐ฅ log๐ท ๐ฅ +๐๐(๐ฅ) log 1 โ ๐ท ๐ฅ ๐๐ฅ
โฆMinimize ๐ฟ ๐ท, ๐บ w.r.t. ๐ท๐๐ฟ
๐๐ท= 0 and ignore the integral (we sample over all ๐ฅ)
โฆThe function ๐ฅ โ ๐ log ๐ฅ + ๐ log(1 โ ๐ฅ) attains max in [0, 1] at ๐
๐+๐
oThe optimal discriminator
๐ทโ ๐ฅ =๐๐(๐ฅ)
๐๐ ๐ฅ + ๐๐(๐ฅ)โฆAnd at optimality ๐๐ ๐ฅ โ ๐๐ ๐ฅ , thus
๐ทโ ๐ฅ =1
2๐ฟ ๐บโ, ๐ทโ = โ2 log 2
Why is this the optimal discriminator?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 38
oBy expanding the Jensen-Shannon divergence, we have
๐ท๐ฝ๐(๐๐||๐๐) =1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2) +
1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2)
=1
2แlog 2 + เถฑ
๐ฅ
๐๐ ๐ฅ log๐๐ ๐ฅ
๐๐ ๐ฅ + ๐๐ ๐ฅ๐๐ฅ + log 2
GANs and Jensen-Shannon divergence
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 39
o By expanding the Jensen-Shannon divergence, we have
๐ท๐ฝ๐(๐๐||๐๐) =1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2) +
1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2)
=1
2แlog 2 + เถฑ
๐ฅ
๐๐ ๐ฅ log๐๐ ๐ฅ
๐๐ ๐ฅ + ๐๐ ๐ฅ๐๐ฅ + log 2
GANs and Jensen-Shannon divergence
https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 40
oDoes the divergence make a difference?
o Is there a difference between KL-divergence, Jensen-Shannon divergence, โฆ
๐ท๐พ๐ฟ(๐๐| ๐๐ = เถฑ๐ฅ
๐๐ log๐๐๐๐
๐๐ฅ
๐ท๐ฝ๐(๐๐||๐๐) =1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2) +
1
2๐ท๐พ๐ฟ(๐๐||
๐๐ + ๐๐
2)
oLetโs check the KL-divergence
Is the divergence important?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 41
oForward KL divergence: ๐ท๐พ๐ฟ(๐(๐ฅ)| ๐โ(๐ฅ)
high probability everywhere that the data occurs
oBackward KL divergence: ๐ท๐พ๐ฟ(๐โ(๐ฅ)||๐(๐ฅ))
low probability wherever the data does not occur
oWhich version makes the model โconservativeโ?
Is the divergence important?
๐ท๐พ๐ฟ(๐๐| ๐๐ = เถฑ๐ฅ
๐๐ log๐๐๐๐
๐๐ฅ
๐๐ is what we get and cannot change๐๐ is what we make through
our model and (through training) change
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 42
o๐ท๐พ๐ฟ(๐(๐ฅ)||๐โ(๐ฅ)) high probability everywhere that the data occurs
o๐ท๐พ๐ฟ(๐โ(๐ฅ)||๐(๐ฅ)) low probability wherever the data does not occur
oWhich version makes the model โconservativeโ?
o๐ท๐พ๐ฟ(๐โ(๐ฅ)||๐ ๐ฅ ) = ๐โ(๐ฅ)log๐
โ ๐ฅ
๐ ๐ฅ
โฆAvoid areas where ๐ ๐ฅ โ 0
oZero-forcingโฆ๐โ ๐ฅ โ 0 in areas when approximation๐โ ๐ฅ
๐ ๐ฅcannot be good
Is the divergence important?
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 43
o JS is symmetric, KL is not
KL vs JS
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 44
oGANs is a mini-max optimizationโฆNon-cooperative game with a tied objective
o Training is not always easyWhen optimizing one player/network, we might hurt the other one oscillations
o Assume two players ๐ ๐ฅ = ๐ฅ๐ฆWe optimize one step at a timeโฆPlayer 1 minimizes: min
x๐1 ๐ฅ = ๐ฅ๐ฆ โ
๐๐1
๐๐ฅ= ๐ฆ
โ ๐ฅ๐ก+1 = ๐ฅ๐ก โ ๐ โ ๐ฆ
โฆPlayer 2 minimizes: miny
๐2 ๐ฅ = โ๐ฅ๐ฆ โ๐๐2
๐๐ฅ= โ๐ฅ
โ ๐ฆ๐ก+1 = ๐ฆ๐ก + ๐ โ ๐ฅ
GAN Problems: Reaching Nash equilibrium causes instabilities
https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 45
๐ฝ(๐ท) = โ1
2๐ผ๐ฅ~๐๐๐๐ก๐ log๐ท ๐ฅ โ
1
2๐ผ๐ง log(1 โ ๐ท(๐บ(๐ง))
๐ฝ(๐บ) = โ1
2๐ผ๐ง log(๐ท(๐บ(๐ง))
o If the discriminator is quite bad no accurate feedback for generator no reasonable generator gradients
oBut, if the discriminator is perfect, ๐ท ๐ฅ = ๐ทโ(๐ฅ) gradients go to 0 no learning anymore
oBad when this happens early in the trainingโฆEasier to train the discriminator than the generator
GAN Problems: Vanishing Gradients
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 46
oVery low variability
o It is safer for thegenerator to producesamples from the mode it knows it approximates well
GAN Problems: Mode collapse
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 47
oData lie in low-dim manifolds
oHowever, the manifold is not known
oDuring training ๐๐ is not perfect either, especially in the start
oSo, the support of ๐๐ and ๐๐ is non-overlapping and disjoint not good for KL/JS divergences
oEasy to find a discriminating line
GAN Problems: Low dimensional supports
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 48
o Instead of KL/JS, use Wasserstein (Earth Moverโs) Distance๐ ๐๐ , ๐๐ = inf
๐พ~ฮ (pr,pg)E x,y ~ฮณ|๐ฅ โ ๐ฆ|
oEven for non-overlapping supports, the distance is meaningful
Wasserstein GAN
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 49
o Instead of matching image statistics, match feature statistics
๐ฝ(๐ท) = ๐ผ๐ฅ~๐๐๐ ๐ฅ โ ๐ผ๐ง~๐๐ง๐ ๐บ ๐ง2
2
o๐ can be any statistic of the data, like the mean or the median
Feature matching
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 53
oUse SGD-like algorithm of choiceโฆAdam Optimizer is a good choice
oUse two mini-batches simultaneouslyโฆThe first mini-batch contains real examples from the training set
โฆThe second mini-batch contains fake generated examples from the generator
oOptional: run k-steps of one player (e.g. discriminator) for every step of the other player (e.g. generator)
Training procedure
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 54
oLearning a conditional model ๐(๐ฆ|๐ฅ) is often generates better samplesโฆDenton et al., 2015
oEven learning ๐(๐ฅ, ๐ฆ) makes samples look more realisticโฆSalimans et al., 2016
oConditional GANs are a great addition for learning with labels
Use labels if possible
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 55
oDefault discriminator cost:
cross_entropy(1., discriminator(data))+ cross_entropy(0., discriminator(samples))
oOne-sided label smoothing:
cross_entropy(0.9, discriminator(data))+ cross_entropy(0., discriminator(samples))
oDo not smooth negative labels:
cross_entropy(1.-alpha, discriminator(data))+ cross_entropy(beta, discriminator(samples))
One-sided label smoothing
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 56
oMax likelihood often is overconfidentโฆMight return accurate prediction, but too high probabilities
oGood regularizerโฆSzegedy et al., 2015
oDoes not reduce classification accuracy, only confidence
oSpecifically for GANsโฆPrevents discriminator from giving very large gradient signals to generator
โฆPrevents extrapolating to encourage extreme samples
Benefits of label smoothing
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 57
oGenerally, good practice for neural networks
oGiven inputs ๐ = {๐ฅ 1 , ๐ฅ 2 , โฆ , ๐ฅ(๐)}
oCompute mean and standard deviation of features of ๐: ๐๐๐, ๐๐๐
oNormalize featuresโฆSubtract mean, divide by standard deviation
Batch normalization
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 58
Batch normalization: Graphically
Layer kLayer k+1
๐ง๐ = โ(๐ฅ๐โ1) ๐ฅ๐+1 = ๐ง๐
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 59
Batch normalization: Graphically
Layer kLayer k+1
๐ง๐ = โ(๐ฅ๐โ1)Batch norm(๐๐๐
(๐ก), ๐๐๐
(๐ก))
๐ฅ๐+1 =๐ง๐ โ ๐๐๐๐๐๐
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 60
But, can cause strong intra-batch correlation
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 61
oTraining with two mini-batches
oOne fixed reference mini-batch for computing mean and standard deviation
oThe other for doing the training as usual
oProceed as normal, only use the mean and standard deviation for the batch norm from the fixed reference mini-batch
oProblem: Overfitting to the reference mini-batch
Reference batch normalization
Iteration 1
Iteration 2
Iteration 3
Standard
mini-batch
Reference
mini-batch
๐๐๐, ๐๐๐
๐๐๐, ๐๐๐
๐๐๐, ๐๐๐
๐๐ฝ(1)
๐๐
๐๐ฝ(2)
๐๐
๐๐ฝ(3)
๐๐
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 62
oMini-batch= standard mini-batch + reference/fixed mini-batch
Solution: Virtual batch normalization
Iteration 1
Iteration 2
Iteration 3
Standard
mini-batch
Reference
mini-batch
๐๐๐๐ , ๐๐๐
(๐ )๐๐ฝ(1)
๐๐
๐๐ฝ(2)
๐๐
๐๐ฝ(3)
๐๐
๐๐๐๐ , ๐๐๐
(๐ )
๐๐๐๐ , ๐๐๐
(๐ )
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 63
oUsually the discriminator winsโฆThatโs good, in that the theoretical justification assume a perfect discriminator
oUsually the discriminator network is bigger than the generator
oSometimes running discriminator more often than generator works betterโฆHowever, no real consensus
oDo not limit the discriminator to avoid making it too smartโฆBetter use non-saturating cost
โฆBetter use label smoothing
Balancing Generator & Discriminator
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 64
oOptimization is tricky and unstableโฆ finding a saddle point does not imply a global minimum
oAn equilibrium might not even be reached
oMode-collapse is the most severe form of non-convergence
Open Question: Non-convergence
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 65
oDiscriminator converges to the correct distribution
oGenerator however places all mass in the most likely point
Open Question: Mode collapse
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 66
oDiscriminator converges to the correct distribution
oGenerator however places all mass in the most likely point
oProblem: low sample diversity
Open Question: Mode collapse
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 67
oClassify each sample by comparing to other examples in the mini-batch
o If samples are too similar, the model is penalized
Minibatch features
Penalized Not Penalized
Mini-batch
Sample
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 68
oDespite the nice images, who cares?
o It would be nice to quantitatively evaluate the model
oFor GANs it is even hard to estimate the likelihood
Open Question: Evaluation of GANs
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 69
oThe generator must be differentiable
o It cannot be differentiable if outputs are discrete
oE.g., harder to make it work for text
oPossible workaroundsโฆREINFORCE [Williams, 1992]
โฆConcrete distribution [Maddison et al., 2016]
โฆGumbel softmax [Jang et al., 2016]
โฆTrain GAN to generate continuous embeddings
Open Question: Discrete outputs
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 70
Open Question: Semi-supervised classification
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 71
o InfoGAN [Chen et al., 2016]
Interpretable latent codes
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 72
oConditional GANsโฆStandard GANs have no encoder!
oActor-CriticโฆRelated to Reinforcement Learning
GAN spinoffs
Conditional GAN
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 73
oGANs interpreted as actor-critic [Pfau and Vinyals, 2016]
oGANs as inverse reinforcement learning [Finn et al., 2016]
oGANs for imitation learning [Ho and Ermin 2016]
Connections to Reinforcement Learning
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 74
Application: Image to Image translation
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 75
Application: Style transfer
UVA DEEP LEARNING COURSE โ EFSTRATIOS GAVVES GENERATIVE ADVERSARIAL NETWORKS - 76
ohttps://www.youtube.com/watch?v=XOxxPcy5Gr4
Application: Face generation
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
GENERATIVE ADVERSARIAL NETWORKS - 77
Summary
oGANs are generative models using supervised learning to approximate an intractable cost function
oGANs can simulate many cost functions, including max likelihood
oFinding Nash equilibria in high-dimensional, continuous, non-convex games is an important open research problem
oGAN research is in its infancy, most works published only in 2016. Not mature enough yet, but very compelling results