deep learning - jasonyanglu.github.io · imagesource:liu, ming, yukangding, min xia, xiao liu,...

85
DEEP LEARNING Lecture 9: Generative Adversarial Networks Dr. Yang Lu Department of Computer Science [email protected]

Upload: others

Post on 04-Feb-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • DEEP LEARNINGLecture 9: Generative Adversarial Networks

    Dr. Yang Lu

    Department of Computer Science

    [email protected]

  • GAN Applications

    ¡ Image Translation

    1Image source: Choi, Yunjey, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. "Stargan v2: Diverse image synthesis for multiple domains." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188-8197. 2020.

  • GAN Applications

    ¡ Image Translation

    2Image source: Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414-2423. 2016.

  • GAN Applications

    ¡ Scene Generation

    3Image source: Tang, Hao, Dan Xu, Yan Yan, Philip HS Torr, and Nicu Sebe. "Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7870-7879. 2020.

  • GAN Applications

    ¡ Facial Attribute Manipulation

    4Image source: Geng, Zhenglin, Chen Cao, and Sergey Tulyakov. "3d guided fine-grained face manipulation." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9821-9830. 2019.

  • GAN Applications

    ¡ Facial Attribute Manipulation

    5Image source: Liu, Ming, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, and Shilei Wen. "Stgan: A unified selective transfer network for arbitrary image attribute editing." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3673-3682. 2019.

  • GAN Applications

    ¡ Gaze Correction

    6Image source: Zhang, Jichao, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, and Nicu Sebe. "Gazecorrection: Self-guided eye manipulation in the wild using self-supervised generative adversarial networks." arXiv preprint arXiv:1906.00805 (2019).

  • GAN Applications

    ¡ Image Animation

    7Image source: Siarohin, Aliaksandr, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. "Animating arbitrary objects via deep motion transfer." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2377-2386. 2019.

  • GAN Applications

    ¡ Image Inpainting

    8Image source: Yu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. "Generative image inpainting with contextual attention." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505-5514. 2018.

  • GAN Applications

    ¡ Image Inpainting

    9Image source: Yu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. "Free-form image inpainting with gated convolution." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471-4480. 2019.

  • GAN Applications

    ¡ Image blending

    10Image source: Wu, Huikai, Shuai Zheng, Junge Zhang, and Kaiqi Huang. "Gp-gan: Towards realistic high-resolution image blending." In Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487-2495. 2019.

  • GAN Applications

    ¡ Image Super-Resolution

    11Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.https://www.theverge.com/21298762/face-depixelizer-ai-machine-learning-tool-pulse-stylegan-obama-bias

  • GAN Applications

    ¡ Shadow Detection and Removal

    12Image source: Ding, Bin, Chengjiang Long, Ling Zhang, and Chunxia Xiao. "Argan: Attentive recurrent generative adversarial network for shadow detection and removal." In Proceedings of the IEEE international conference on computer vision, pp. 10213-10222. 2019.

  • GAN Applications

    ¡ Makeup

    13Image source: Li, Tingting, Ruihe Qian, Chao Dong, Si Liu, Qiong Yan, Wenwu Zhu, and Liang Lin. "Beautygan: Instance-level facial makeup transfer with deep generative adversarial network." In Proceedings of the 26th ACM international conference on Multimedia, pp. 645-653. 2018.

  • GAN Applications

    ¡ Facial Landmark Detection

    14Image source: Dong, Xuanyi, Yan Yan, Wanli Ouyang, and Yi Yang. "Style aggregated network for facial landmark detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 379-388. 2018.

  • GAN Applications

    ¡ Person Re-identification

    15Image source: Zheng, Zhedong, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. "Joint discriminative and generative learning for person re-identification." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2138-2147. 2019.

  • GAN Applications

    ¡ Music Generation

    16

    Image source: Yang, Li-Chia, Szu-Yu Chou, and Yi-Hsuan Yang. "MidiNet: A convolutional generative adversarial network for symbolic-domain music generation." arXiv preprint arXiv:1703.10847 (2017).

    Generated music samples: https://soundcloud.com/vgtsv6jf5fwq/sets/midinet-samples

    https://soundcloud.com/vgtsv6jf5fwq/sets/midinet-samples

  • Generative vs. Discriminative

    ¡ In machine learning, two main approaches are called the generative approaches and the discriminative approaches.

    ¡ Given an observable variable 𝑋 and a target variable 𝑌:¡ A generative model is a statistical model of the data distribution 𝑃(𝑋) or the

    joint probability distribution on 𝑋×𝑌: 𝑃(𝑋, 𝑌).¡ A discriminative model is a model of the conditional distribution of 𝑌 given 𝑋:𝑃(𝑌|𝑋 = 𝑥).

    17

    Image source: https://medium.com/@jordi299/about-generative-and-discriminative-models-d8958b67ad32

  • Discriminative Approaches

    ¡ Most supervised learning methods fallinto discriminative approaches.¡ Given data: (𝑥, 𝑦), 𝑥 is data, 𝑦 is label.

    ¡ Goal: Learn a function to map 𝑥 −> 𝑦,namely posterior probability 𝑃(𝑌|𝑋 = 𝑥).

    ¡ Examples: Classification, regression, object detection, face recognition, sentimentclassification, etc.

    18Image source: https://www.autoexpress.co.uk/tesla/model-3https://timesofindia.indiatimes.com/life-style/relationships/pets/5-things-that-scare-and-stress-your-cat/articleshow/67586673.cms

    cat

    car

  • Generative Approaches

    ¡ Given training data, generate new samples from same distribution.¡ Objectives: 1. Learn 𝑝!"#$%(𝑥) that approximates 𝑝#&'&(𝑥).2. Sample a new 𝑥 from 𝑝!"#$%(𝑥).

    19

    Image source: http://www.lherranz.org/2018/08/07/imagetranslation/

  • Outlines

    ¡ GAN

    ¡ DCGAN

    ¡ CGAN

    ¡ WGAN

    ¡ SAGAN

    ¡ pix2pix and CycleGAN

    ¡ SRGAN

    20

  • GAN

    21

  • GAN

    ¡ GAN was proposed by Ian Goodfellow in2014.

    ¡ Yann LeCun described GANs as “the most interesting idea in the last 10 years in machine learning”.

    ¡ Ian presented and explained his paper in NIPS 2016 with a 2-hour presentation.

    22

    Image source: https://youtu.be/HGYYEUSm-0Q

    Ian in NIPS 2016

  • GAN: How to Do

    ¡ GAN is composed by a generator and a discriminator. They are bothneural networks.¡ Generator network: try to fool the discriminator by generating real-looking

    images.¡ Discriminator network: try to distinguish between real and fake images.

    23

    Image source: https://sthalles.github.io/intro-to-gans/

  • GAN: How to Do

    ¡ Generator and discriminator tells each other where it waswrong.¡ Generator tells discriminator how I detect you.

    ¡ Discriminator tells generator how I fool you.

    24

    Image source: https://sthalles.github.io/intro-to-gans/

    Discriminator learning signal

    Generator learning signal

  • GAN: How to Learn

    ¡ Given a prior on input noise variables 𝑝𝒛(𝒛), generator 𝐺"!(𝒛)take 𝒛 as input and map it into data space.

    ¡ Discriminator 𝐷""(𝒙) takes the data 𝒙 as input and outputprobability that 𝒙 came from the real data rather thangenerated data 𝐺"!(𝒛).

    ¡ 𝐷"" and 𝐺"! have different goals (1 for real, 0 for fake):

    ¡ Generator wants: 𝐷!! 𝐺!" 𝒛 → 1.

    ¡ Discriminator wants: 𝐷!! 𝒙 → 1, 𝐷!! 𝐺!" 𝒛 → 0.

    25

  • GAN: How to Learn

    ¡ By maximizing the log-likelihood, the overall objective is to simultaneously train over all 𝒙 with random generated 𝒛:

    ¡ train 𝐺!! to minimize log 1 − 𝐷!" 𝐺!! 𝒛 ;

    ¡ train 𝐷!" to maximize log𝐷!"(𝒙) and log 1 − 𝐷!" 𝐺!! 𝒛 .

    ¡ In other words, 𝐷!! and 𝐺!" play the following two-player minimax game:

    min!"

    max!!

    𝔼𝒙~$!#$# log𝐷!! 𝒙 + 𝔼𝒛~$𝒛 𝒛 log 1 − 𝐷!! 𝐺!" 𝒛

    26

    Discriminator output for real data 𝒙

    Discriminator output for generated fake data 𝐺!! 𝒛 .

  • GAN: How to Learn

    Objective:

    min("

    max(#

    𝔼𝒙~+#$%$ log𝐷(# 𝒙 + 𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛

    Alternate between: ¡ Gradient ascent on discriminator:

    max(#

    𝔼𝒙~+#$%$ log𝐷(# 𝒙 + 𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛

    ¡ Gradient descent on generator:

    min("

    𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛

    27

  • GAN: How to Learn

    ¡ After several steps of training, if 𝐺 and 𝐷 have enough capacity, they will reach a point at which both cannot improve because 𝑝& = 𝑝'()(.¡ Generator can generate real image.¡ Discriminator is unable to differentiate between the two

    distributions, i.e. 𝐷&!(𝒙) = 1/2.

    28

    Image source: Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.

    discriminative distribution

    datadistribution

    generativedistribution

  • GAN: Algorithm

    29

  • GAN: Result

    30

    Image source: Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.

    Rightmost column shows the nearest training example of the neighboring sample, in order to demonstrate that the model has not memorized the training set.

  • GAN: Result

    31

    Image source: https://sthalles.github.io/intro-to-gans/

    SVHNs MNIST

  • GAN: Frontiers

    32

    Image source: Ian Goodfellow. Samples from Goodfellow et al., 2014, Radford et al., 2015, Liu et al., 2016, Karras et al., 2017, Karras et al., 2018

  • The GAN Zoo: Explosion of GANs

    33

    Source: https://github.com/hindupuravinash/the-gan-zoo

  • DCGAN

    34

  • DCGAN

    ¡ Vanilla GAN simply uses MLP, rather than CNN in both generator and discriminator.

    ¡ CNN can be easily applied to discriminator.

    ¡ Now the problem is: how can CNN be used as a generator?¡ Pooling leads to downsampling, how to upsampling?

    35

  • Fractionally-Strided Convolutions

    36

    1 2

    3 4

    Input: 2×2Filter: 3×3

    Output: 4×4

    40 50

    30 20

    60

    10

    10 20 3040 50

    30 20

    60

    10

    10 20 30

  • Fractionally-Strided Convolutions

    37

    1 2

    3 4

    10Input: 2×2Filter: 3×3

    Output: 4×4

    40 50

    30 20

    60

    10

    10 20 3050+80

    60+100

    20+60

    10+40

    120

    20

    20+20

    30+40 60

    40

    30

  • Fractionally-Strided Convolutions

    38

    1 2

    3 4

    60Input: 2×2Filter: 3×3

    Output: 4×4

    40 50

    30 20

    60

    10

    10 20 30

    30+120

    80+150

    90 60

    50+180

    30

    40+30

    130+60

    160+90 120

    20

    7010 40

    Output: 4×4

  • Fractionally-Strided Convolutions

    39

    1 2

    3 4

    70

    Input: 2×2Filter: 3×3

    Output: 4×4

    40 50

    30 20

    60

    10

    10 20 30

    230+160

    230+200

    60+120

    30+80

    20+240

    40

    190+40

    250+80

    120+120

    150

    90

    10 6040 70

    Output: 4×4

  • Fractionally-Strided Convolutions

    ¡ Fractionally-strided convolutions are alsocalled transposed convolutions.¡ PyTorch: torch.nn.ConvTranspose2d.

    ¡ TensorFlow:tf.keras.layers.Conv2DTranspose.

    ¡ Some researchers are used to calldeconvolutions. However, truedeconvolutions are the inverse operationof convolution, which is not the same asfractionally-strided convolutions.

    40

    Image source: https://github.com/vdumoulin/conv_arithmetic

    Transposed convolution with strideis equivalent to convolving with

    zero-padding and inserting zeros.

    More linear algebra details: https://arxiv.org/pdf/1603.07285.pdf

  • DCGAN

    41

    Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

    stride=2 everywhere

  • DCGAN: Visual Results

    42

    Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

    The generated bedrooms look very nice!

  • DCGAN: Walking in the Latent Space

    ¡ If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations.

    43

    Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

    Interpolation between a series of 9 random points in 𝑍show that the space learned has smooth transitions.

  • DCGAN: Vector Arithmetic

    44

    Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).

    For each column, the 𝑍 vectors of samples are averaged. Arithmetic was then performed on the mean vectors creating a new vector 𝑌.

  • DCGAN: Use as Feature Extractor

    ¡ Train on Imagenet-1k and then use the discriminator’s convolutional features from all layers.

    ¡ Maxpooling each layers representation to produce a 4 × 4 spatial grid.

    ¡ These features are then flattened and concatenated to form a 28672 dimensional vector.

    ¡ A regularized linear L2-SVM classifier is trained on top of them.

    45

  • CGAN

    46

  • CGAN

    ¡ We can’t control what we generate from the vanilla GAN.¡ Noise is the only input and it is totally random.

    ¡ How can we tell GAN what we want it to generate?¡ Straightforward solution: replace data distribution by

    conditional distribution.𝑝 𝒙 → 𝑝 𝒙 𝑦 .

    ¡ Now, the problem becomes:¡ Generator: generate a sample for class 𝑦.¡ Discriminator: distinguish the real sample in class 𝑦 and the generated

    sample in class 𝑦.

    47

  • CGAN

    ¡ Both generator and discriminator are conditioned on some extra information 𝒚:

    min!"

    max!!

    ?

    @

    𝔼*~$!#$# log𝐷!! 𝒙|𝒚

    + 𝔼+~$ + log 1 − 𝐷!! 𝐺!" 𝒛|𝒚 .

    ¡ 𝒚 could be any kind of auxiliary information, such as class labels or data from other modalities.¡ E.g. the speech of saying that class.

    48

  • CGAN

    49

    Generate MNIST digits by directly feeding one-hot class label.

  • WGAN

    50

  • WGAN

    ¡ The first paper lists several problems of GANs based ontheoretical analysis:¡ Unstable behavior of training.

    ¡ The cost function doesn’t represent the quality of generated images.

    ¡ Based on the analysis, the second paper proposed Wasserstein GAN (WGAN).

    51

  • Problems of GANs

    Problem 1: Good discriminator makes the gradient of generator vanish.

    ¡ If discriminator is well trained, the gradient of generator vanishes.

    ¡ If discriminator is poorly trained, the gradient of generator hasgreat variance.

    ¡ Generator gets well trained only if discriminator is neither toogood nor too poor.¡ That’s why GANs are extremely difficult to train.

    52

  • Problems of GANs

    ¡ Using the annotations in the paper, we have the original loss function to minimize:

    𝐿 𝐷, 𝑔% = 𝔼𝒙~(# log𝐷 𝒙 + 𝔼𝒙~(! log 1 − 𝐷 𝒙 .

    ¡ By making the derivative of 𝐷 𝒙 to 0, the optimal discriminator has the shape:

    𝐷∗ 𝒙 =𝑃*(𝒙)

    𝑃* 𝒙 + 𝑃+(𝒙).

    ¡ Replace 𝐷∗ 𝒙 into the loss function we get:𝐿 𝐷∗, 𝑔% = 2𝐽𝑆(𝑃*||𝑃+) − 2 log 2 ,

    where 𝐽𝑆 is the Jensen-Shannon divergence. This is the loss of generator withoptimal discriminator.

    ¡ In most of cases, 𝐽𝑆(𝑃*| 𝑃+ = log 2, which is a constant. That makes thegradient of generator 0.

    53

  • Problems of GANs

    Problem 2: the objective of the − log𝐷 trick leads to collapse mode, where the generated samples lack of diversity.

    ¡ Ian Goodfellow also proposed to replace log 1 − 𝐷 𝒙 bylog −𝐷 𝒙 to solve the generator gradient vanishing problem.It is usually referred to the − log𝐷 trick.

    ¡ Then, the loss of generator with optimal discriminator wasderived:

    𝐿 𝐷∗, 𝑔" = 𝐾𝐿(𝑃.||𝑃/) − 2𝐽𝑆(𝑃/||𝑃.)

    where 𝐾𝐿 is the KL divergence.

    54

  • Problems of GANs

    ¡ By optimizing this generator loss, we may encounter collapse mode.

    ¡ The loss penalizes¡ slightly for the case that generator is not able to generate real sample;

    ¡ heavily for the case that generator generates unreal samples.

    ¡ Thus, generator is more likely to generate safe samples (lowdiversity), rather than trying to explore new sample space.

    55

  • WGAN

    ¡ Replace JS and KL divergence by Wasserstein distance, aka the Earth-Mover (EM) distance:

    𝑊 𝑃. , 𝑃/ = inf0∈2(4',4")𝔼 7,8 ~0[||x − 𝑦||] .

    where Π(𝑃. , 𝑃/) denotes the set of all joint distributions 𝛾(𝑥, 𝑦)whose marginals are respectively 𝑃. and 𝑃/.

    ¡ Intuitively, 𝛾(𝑥, 𝑦) indicates how much “mass” must be transported from 𝑥 to 𝑦 in order to transform the distributions 𝑃. into the distribution 𝑃/.

    ¡ The EM distance then is the “cost” of the optimal transport plan.¡ The new loss function can represent image quality.

    56

  • WGAN

    ¡ How to make the optimization to minimize Wassersteindistance?

    ¡ Only three modifications to the original GAN:1. Remove sigmoid activation in discriminator.

    2. Remove log in loss of both discriminator and generator.

    3. Weight clipping in a certain range [−𝑐, 𝑐].

    57

  • WGAN

    58

    Replace Adam by RMSProp

    No log

    Weight clipping

    No sigmoid

  • WGAN: Result

    ¡ The Wasserstein estimate directly reflects the quality ofgenerated images.

    59

    Image source: Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).

  • WGAN: Result

    60

    DCGAN with WGAN DCGAN with GAN

    WGAN without BN GAN without BN

    WGAN using MLP GAN using MLP

    Image source: Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).

  • SAGAN

    61

  • SAGAN: Motivation

    Problem of long-range dependencies.¡ Traditional convolutional GANs generate high-resolution

    details as a function of only spatially local points in lower-resolution feature maps.

    ¡ Long range dependencies can only be processed after passing through several convolutional layers.

    ¡ It fails to capture geometric or structural patterns that occur consistently in some classes. ¡ For example, dogs are often drawn with realistic fur texture but without clearly

    defined separate feet.

    62

  • SAGAN

    63

    Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.

  • SAGAN

    64

    Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.

    ¡ Three mapping functions 𝑓(𝑥), 𝑔(𝑥), ℎ(𝑥) and finally a weightedfeature is generated.

    ¡ Is it similar to something we have seen before?

  • SAGAN

    65

    Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.

  • PIX2PIX AND CYCLEGAN

    66

  • pix2pix

    ¡ Given a pair of images, transfer the style of one image toanother.

    67

    Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.

  • pix2pix

    ¡ The model is based on CGAN.

    ¡ As an improvement, the generator is tasked to not only fool the discriminator but also to be near the ground truth output.

    ¡ 𝐿, penalization is added to the loss of CGAN to encourage less blurring:𝐿-' 𝐺 = 𝔼*,/,+ 𝑦 − 𝐺 𝑥, 𝑧 , .

    68

    Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.

  • pix2pix: More Applications

    69

    Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.

  • CycleGAN

    ¡ Paired examples can be expensive to obtain.

    ¡ Can we translate from 𝑋 ↔ 𝑌 in an unsupervised manner?

    70

    Paired vs. unpaired examples

  • CycleGAN

    ¡ Two generators:¡ 𝐺: 𝑋 → 𝑌;

    ¡ 𝐹: 𝑌 → 𝑋.

    ¡ Two discriminators:¡ 𝐷0 aims to distinguish between images {𝑥} and translated images {𝐹(𝑦)};

    ¡ 𝐷1 aims to discriminate between {𝑦}and {𝐺(𝑥)}.

    71Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.

  • CycleGAN

    ¡ If we can go from 𝑋 to >𝑌 via 𝐺, then it should also be possible to go from >𝑌 back to 𝑋 via 𝐹.

    ¡ Cycle consistency loss is added to the original adversarial loss:

    𝐿010 𝐺, 𝐹 = 𝔼2 𝐹 𝐺 𝑥 − 𝑥 3 + 𝔼4 𝐺 𝐹 𝑥 − 𝑦 3 .

    72Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.

  • CycleGAN

    73Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.

    Failed case:

  • SRGAN

    74

  • SRGAN

    ¡ Task: Given a low-resolution image, reconstruct the high-resolution image.

    ¡ Previous work: Residual transpose CNN with MSE loss has shown greatperformance on this task butthe reconstructions areblurry.

    75

    Bicubic Original

    Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.

  • SRGAN

    ¡ Generator: Similar to ResNet, but uses Transposed Convolutions to increase the spatial dimensions.

    ¡ Discriminator: Fully-convolutional CNN + MLP Classifier + Binary Cross Entropy.

    76Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.

  • SRGAN: VGG Loss

    ¡ Pixel-wise MSE loss often lacks high-frequency content which results in smooth textures.:

    ¡ The authors uses VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network

    ¡ 𝜙(,* is the feature map obtained by the 𝑗-th convolution (after activation) before the 𝑖-th maxpooling layer.

    77

    Image source: Lecture 4, MSBA434, UCLA

  • SRGAN: Dual Loss

    ¡ Weighted sum of adversarial loss and VGG activation loss:

    78

    Image source: Lecture 4, MSBA434, UCLA

  • SRGAN: Results

    79Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.

  • General Idea of Using GAN

    ¡ GAN can be applied to lots of applications, rather than justimages.

    ¡ We should identify several things to make sure that our taskcan be solved by GAN.¡ What are fake samples?

    ¡ What are real samples?

    ¡ How to transform fake samples to real samples (generator)?

    ¡ How to distinguish fake sample from real samples (discriminator)?

    80

  • Conclusion

    After this lecture, you should know:

    ¡ What is the basic idea of GAN?

    ¡ How generator and discriminator improve each other?

    ¡ How does transposed convolution work?

    ¡ How to design application specific loss to train with adversarialloss?

    81

  • Suggested Reading

    ¡ Adversarial Nets Papers

    ¡ Tips and tricks to make GANs work

    ¡令人拍案叫绝的Wasserstein GAN

    82

    https://github.com/zhangqianhui/AdversarialNetsPapershttps://github.com/soumith/ganhackshttps://zhuanlan.zhihu.com/p/25071913

  • Assignment 4

    ¡ Assignment 4 is released. The deadline is 18:00, 11th December.

    83

  • Thank you!

    ¡ Any question?

    ¡ Don’t hesitate to send email to me for asking questions anddiscussion. J

    84