deep learning - jasonyanglu.github.io · imagesource:liu, ming, yukangding, min xia, xiao liu,...
TRANSCRIPT
-
DEEP LEARNINGLecture 9: Generative Adversarial Networks
Dr. Yang Lu
Department of Computer Science
-
GAN Applications
¡ Image Translation
1Image source: Choi, Yunjey, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. "Stargan v2: Diverse image synthesis for multiple domains." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188-8197. 2020.
-
GAN Applications
¡ Image Translation
2Image source: Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414-2423. 2016.
-
GAN Applications
¡ Scene Generation
3Image source: Tang, Hao, Dan Xu, Yan Yan, Philip HS Torr, and Nicu Sebe. "Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7870-7879. 2020.
-
GAN Applications
¡ Facial Attribute Manipulation
4Image source: Geng, Zhenglin, Chen Cao, and Sergey Tulyakov. "3d guided fine-grained face manipulation." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9821-9830. 2019.
-
GAN Applications
¡ Facial Attribute Manipulation
5Image source: Liu, Ming, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, and Shilei Wen. "Stgan: A unified selective transfer network for arbitrary image attribute editing." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3673-3682. 2019.
-
GAN Applications
¡ Gaze Correction
6Image source: Zhang, Jichao, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, and Nicu Sebe. "Gazecorrection: Self-guided eye manipulation in the wild using self-supervised generative adversarial networks." arXiv preprint arXiv:1906.00805 (2019).
-
GAN Applications
¡ Image Animation
7Image source: Siarohin, Aliaksandr, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. "Animating arbitrary objects via deep motion transfer." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2377-2386. 2019.
-
GAN Applications
¡ Image Inpainting
8Image source: Yu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. "Generative image inpainting with contextual attention." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505-5514. 2018.
-
GAN Applications
¡ Image Inpainting
9Image source: Yu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. "Free-form image inpainting with gated convolution." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471-4480. 2019.
-
GAN Applications
¡ Image blending
10Image source: Wu, Huikai, Shuai Zheng, Junge Zhang, and Kaiqi Huang. "Gp-gan: Towards realistic high-resolution image blending." In Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487-2495. 2019.
-
GAN Applications
¡ Image Super-Resolution
11Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.https://www.theverge.com/21298762/face-depixelizer-ai-machine-learning-tool-pulse-stylegan-obama-bias
-
GAN Applications
¡ Shadow Detection and Removal
12Image source: Ding, Bin, Chengjiang Long, Ling Zhang, and Chunxia Xiao. "Argan: Attentive recurrent generative adversarial network for shadow detection and removal." In Proceedings of the IEEE international conference on computer vision, pp. 10213-10222. 2019.
-
GAN Applications
¡ Makeup
13Image source: Li, Tingting, Ruihe Qian, Chao Dong, Si Liu, Qiong Yan, Wenwu Zhu, and Liang Lin. "Beautygan: Instance-level facial makeup transfer with deep generative adversarial network." In Proceedings of the 26th ACM international conference on Multimedia, pp. 645-653. 2018.
-
GAN Applications
¡ Facial Landmark Detection
14Image source: Dong, Xuanyi, Yan Yan, Wanli Ouyang, and Yi Yang. "Style aggregated network for facial landmark detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 379-388. 2018.
-
GAN Applications
¡ Person Re-identification
15Image source: Zheng, Zhedong, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. "Joint discriminative and generative learning for person re-identification." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2138-2147. 2019.
-
GAN Applications
¡ Music Generation
16
Image source: Yang, Li-Chia, Szu-Yu Chou, and Yi-Hsuan Yang. "MidiNet: A convolutional generative adversarial network for symbolic-domain music generation." arXiv preprint arXiv:1703.10847 (2017).
Generated music samples: https://soundcloud.com/vgtsv6jf5fwq/sets/midinet-samples
https://soundcloud.com/vgtsv6jf5fwq/sets/midinet-samples
-
Generative vs. Discriminative
¡ In machine learning, two main approaches are called the generative approaches and the discriminative approaches.
¡ Given an observable variable 𝑋 and a target variable 𝑌:¡ A generative model is a statistical model of the data distribution 𝑃(𝑋) or the
joint probability distribution on 𝑋×𝑌: 𝑃(𝑋, 𝑌).¡ A discriminative model is a model of the conditional distribution of 𝑌 given 𝑋:𝑃(𝑌|𝑋 = 𝑥).
17
Image source: https://medium.com/@jordi299/about-generative-and-discriminative-models-d8958b67ad32
-
Discriminative Approaches
¡ Most supervised learning methods fallinto discriminative approaches.¡ Given data: (𝑥, 𝑦), 𝑥 is data, 𝑦 is label.
¡ Goal: Learn a function to map 𝑥 −> 𝑦,namely posterior probability 𝑃(𝑌|𝑋 = 𝑥).
¡ Examples: Classification, regression, object detection, face recognition, sentimentclassification, etc.
18Image source: https://www.autoexpress.co.uk/tesla/model-3https://timesofindia.indiatimes.com/life-style/relationships/pets/5-things-that-scare-and-stress-your-cat/articleshow/67586673.cms
cat
car
-
Generative Approaches
¡ Given training data, generate new samples from same distribution.¡ Objectives: 1. Learn 𝑝!"#$%(𝑥) that approximates 𝑝#&'&(𝑥).2. Sample a new 𝑥 from 𝑝!"#$%(𝑥).
19
Image source: http://www.lherranz.org/2018/08/07/imagetranslation/
-
Outlines
¡ GAN
¡ DCGAN
¡ CGAN
¡ WGAN
¡ SAGAN
¡ pix2pix and CycleGAN
¡ SRGAN
20
-
GAN
21
-
GAN
¡ GAN was proposed by Ian Goodfellow in2014.
¡ Yann LeCun described GANs as “the most interesting idea in the last 10 years in machine learning”.
¡ Ian presented and explained his paper in NIPS 2016 with a 2-hour presentation.
22
Image source: https://youtu.be/HGYYEUSm-0Q
Ian in NIPS 2016
-
GAN: How to Do
¡ GAN is composed by a generator and a discriminator. They are bothneural networks.¡ Generator network: try to fool the discriminator by generating real-looking
images.¡ Discriminator network: try to distinguish between real and fake images.
23
Image source: https://sthalles.github.io/intro-to-gans/
-
GAN: How to Do
¡ Generator and discriminator tells each other where it waswrong.¡ Generator tells discriminator how I detect you.
¡ Discriminator tells generator how I fool you.
24
Image source: https://sthalles.github.io/intro-to-gans/
Discriminator learning signal
Generator learning signal
-
GAN: How to Learn
¡ Given a prior on input noise variables 𝑝𝒛(𝒛), generator 𝐺"!(𝒛)take 𝒛 as input and map it into data space.
¡ Discriminator 𝐷""(𝒙) takes the data 𝒙 as input and outputprobability that 𝒙 came from the real data rather thangenerated data 𝐺"!(𝒛).
¡ 𝐷"" and 𝐺"! have different goals (1 for real, 0 for fake):
¡ Generator wants: 𝐷!! 𝐺!" 𝒛 → 1.
¡ Discriminator wants: 𝐷!! 𝒙 → 1, 𝐷!! 𝐺!" 𝒛 → 0.
25
-
GAN: How to Learn
¡ By maximizing the log-likelihood, the overall objective is to simultaneously train over all 𝒙 with random generated 𝒛:
¡ train 𝐺!! to minimize log 1 − 𝐷!" 𝐺!! 𝒛 ;
¡ train 𝐷!" to maximize log𝐷!"(𝒙) and log 1 − 𝐷!" 𝐺!! 𝒛 .
¡ In other words, 𝐷!! and 𝐺!" play the following two-player minimax game:
min!"
max!!
𝔼𝒙~$!#$# log𝐷!! 𝒙 + 𝔼𝒛~$𝒛 𝒛 log 1 − 𝐷!! 𝐺!" 𝒛
26
Discriminator output for real data 𝒙
Discriminator output for generated fake data 𝐺!! 𝒛 .
-
GAN: How to Learn
Objective:
min("
max(#
𝔼𝒙~+#$%$ log𝐷(# 𝒙 + 𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛
Alternate between: ¡ Gradient ascent on discriminator:
max(#
𝔼𝒙~+#$%$ log𝐷(# 𝒙 + 𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛
¡ Gradient descent on generator:
min("
𝔼𝒛~+𝒛 𝒛 log 1 − 𝐷(# 𝐺(" 𝒛
27
-
GAN: How to Learn
¡ After several steps of training, if 𝐺 and 𝐷 have enough capacity, they will reach a point at which both cannot improve because 𝑝& = 𝑝'()(.¡ Generator can generate real image.¡ Discriminator is unable to differentiate between the two
distributions, i.e. 𝐷&!(𝒙) = 1/2.
28
Image source: Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.
discriminative distribution
datadistribution
generativedistribution
-
GAN: Algorithm
29
-
GAN: Result
30
Image source: Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.
Rightmost column shows the nearest training example of the neighboring sample, in order to demonstrate that the model has not memorized the training set.
-
GAN: Result
31
Image source: https://sthalles.github.io/intro-to-gans/
SVHNs MNIST
-
GAN: Frontiers
32
Image source: Ian Goodfellow. Samples from Goodfellow et al., 2014, Radford et al., 2015, Liu et al., 2016, Karras et al., 2017, Karras et al., 2018
-
The GAN Zoo: Explosion of GANs
33
Source: https://github.com/hindupuravinash/the-gan-zoo
-
DCGAN
34
-
DCGAN
¡ Vanilla GAN simply uses MLP, rather than CNN in both generator and discriminator.
¡ CNN can be easily applied to discriminator.
¡ Now the problem is: how can CNN be used as a generator?¡ Pooling leads to downsampling, how to upsampling?
35
-
Fractionally-Strided Convolutions
36
1 2
3 4
Input: 2×2Filter: 3×3
Output: 4×4
40 50
30 20
60
10
10 20 3040 50
30 20
60
10
10 20 30
-
Fractionally-Strided Convolutions
37
1 2
3 4
10Input: 2×2Filter: 3×3
Output: 4×4
40 50
30 20
60
10
10 20 3050+80
60+100
20+60
10+40
120
20
20+20
30+40 60
40
30
-
Fractionally-Strided Convolutions
38
1 2
3 4
60Input: 2×2Filter: 3×3
Output: 4×4
40 50
30 20
60
10
10 20 30
30+120
80+150
90 60
50+180
30
40+30
130+60
160+90 120
20
7010 40
Output: 4×4
-
Fractionally-Strided Convolutions
39
1 2
3 4
70
Input: 2×2Filter: 3×3
Output: 4×4
40 50
30 20
60
10
10 20 30
230+160
230+200
60+120
30+80
20+240
40
190+40
250+80
120+120
150
90
10 6040 70
Output: 4×4
-
Fractionally-Strided Convolutions
¡ Fractionally-strided convolutions are alsocalled transposed convolutions.¡ PyTorch: torch.nn.ConvTranspose2d.
¡ TensorFlow:tf.keras.layers.Conv2DTranspose.
¡ Some researchers are used to calldeconvolutions. However, truedeconvolutions are the inverse operationof convolution, which is not the same asfractionally-strided convolutions.
40
Image source: https://github.com/vdumoulin/conv_arithmetic
Transposed convolution with strideis equivalent to convolving with
zero-padding and inserting zeros.
More linear algebra details: https://arxiv.org/pdf/1603.07285.pdf
-
DCGAN
41
Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
stride=2 everywhere
-
DCGAN: Visual Results
42
Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
The generated bedrooms look very nice!
-
DCGAN: Walking in the Latent Space
¡ If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations.
43
Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
Interpolation between a series of 9 random points in 𝑍show that the space learned has smooth transitions.
-
DCGAN: Vector Arithmetic
44
Image source: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
For each column, the 𝑍 vectors of samples are averaged. Arithmetic was then performed on the mean vectors creating a new vector 𝑌.
-
DCGAN: Use as Feature Extractor
¡ Train on Imagenet-1k and then use the discriminator’s convolutional features from all layers.
¡ Maxpooling each layers representation to produce a 4 × 4 spatial grid.
¡ These features are then flattened and concatenated to form a 28672 dimensional vector.
¡ A regularized linear L2-SVM classifier is trained on top of them.
45
-
CGAN
46
-
CGAN
¡ We can’t control what we generate from the vanilla GAN.¡ Noise is the only input and it is totally random.
¡ How can we tell GAN what we want it to generate?¡ Straightforward solution: replace data distribution by
conditional distribution.𝑝 𝒙 → 𝑝 𝒙 𝑦 .
¡ Now, the problem becomes:¡ Generator: generate a sample for class 𝑦.¡ Discriminator: distinguish the real sample in class 𝑦 and the generated
sample in class 𝑦.
47
-
CGAN
¡ Both generator and discriminator are conditioned on some extra information 𝒚:
min!"
max!!
?
@
𝔼*~$!#$# log𝐷!! 𝒙|𝒚
+ 𝔼+~$ + log 1 − 𝐷!! 𝐺!" 𝒛|𝒚 .
¡ 𝒚 could be any kind of auxiliary information, such as class labels or data from other modalities.¡ E.g. the speech of saying that class.
48
-
CGAN
49
Generate MNIST digits by directly feeding one-hot class label.
-
WGAN
50
-
WGAN
¡ The first paper lists several problems of GANs based ontheoretical analysis:¡ Unstable behavior of training.
¡ The cost function doesn’t represent the quality of generated images.
¡ Based on the analysis, the second paper proposed Wasserstein GAN (WGAN).
51
-
Problems of GANs
Problem 1: Good discriminator makes the gradient of generator vanish.
¡ If discriminator is well trained, the gradient of generator vanishes.
¡ If discriminator is poorly trained, the gradient of generator hasgreat variance.
¡ Generator gets well trained only if discriminator is neither toogood nor too poor.¡ That’s why GANs are extremely difficult to train.
52
-
Problems of GANs
¡ Using the annotations in the paper, we have the original loss function to minimize:
𝐿 𝐷, 𝑔% = 𝔼𝒙~(# log𝐷 𝒙 + 𝔼𝒙~(! log 1 − 𝐷 𝒙 .
¡ By making the derivative of 𝐷 𝒙 to 0, the optimal discriminator has the shape:
𝐷∗ 𝒙 =𝑃*(𝒙)
𝑃* 𝒙 + 𝑃+(𝒙).
¡ Replace 𝐷∗ 𝒙 into the loss function we get:𝐿 𝐷∗, 𝑔% = 2𝐽𝑆(𝑃*||𝑃+) − 2 log 2 ,
where 𝐽𝑆 is the Jensen-Shannon divergence. This is the loss of generator withoptimal discriminator.
¡ In most of cases, 𝐽𝑆(𝑃*| 𝑃+ = log 2, which is a constant. That makes thegradient of generator 0.
53
-
Problems of GANs
Problem 2: the objective of the − log𝐷 trick leads to collapse mode, where the generated samples lack of diversity.
¡ Ian Goodfellow also proposed to replace log 1 − 𝐷 𝒙 bylog −𝐷 𝒙 to solve the generator gradient vanishing problem.It is usually referred to the − log𝐷 trick.
¡ Then, the loss of generator with optimal discriminator wasderived:
𝐿 𝐷∗, 𝑔" = 𝐾𝐿(𝑃.||𝑃/) − 2𝐽𝑆(𝑃/||𝑃.)
where 𝐾𝐿 is the KL divergence.
54
-
Problems of GANs
¡ By optimizing this generator loss, we may encounter collapse mode.
¡ The loss penalizes¡ slightly for the case that generator is not able to generate real sample;
¡ heavily for the case that generator generates unreal samples.
¡ Thus, generator is more likely to generate safe samples (lowdiversity), rather than trying to explore new sample space.
55
-
WGAN
¡ Replace JS and KL divergence by Wasserstein distance, aka the Earth-Mover (EM) distance:
𝑊 𝑃. , 𝑃/ = inf0∈2(4',4")𝔼 7,8 ~0[||x − 𝑦||] .
where Π(𝑃. , 𝑃/) denotes the set of all joint distributions 𝛾(𝑥, 𝑦)whose marginals are respectively 𝑃. and 𝑃/.
¡ Intuitively, 𝛾(𝑥, 𝑦) indicates how much “mass” must be transported from 𝑥 to 𝑦 in order to transform the distributions 𝑃. into the distribution 𝑃/.
¡ The EM distance then is the “cost” of the optimal transport plan.¡ The new loss function can represent image quality.
56
-
WGAN
¡ How to make the optimization to minimize Wassersteindistance?
¡ Only three modifications to the original GAN:1. Remove sigmoid activation in discriminator.
2. Remove log in loss of both discriminator and generator.
3. Weight clipping in a certain range [−𝑐, 𝑐].
57
-
WGAN
58
Replace Adam by RMSProp
No log
Weight clipping
No sigmoid
-
WGAN: Result
¡ The Wasserstein estimate directly reflects the quality ofgenerated images.
59
Image source: Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).
-
WGAN: Result
60
DCGAN with WGAN DCGAN with GAN
WGAN without BN GAN without BN
WGAN using MLP GAN using MLP
Image source: Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).
-
SAGAN
61
-
SAGAN: Motivation
Problem of long-range dependencies.¡ Traditional convolutional GANs generate high-resolution
details as a function of only spatially local points in lower-resolution feature maps.
¡ Long range dependencies can only be processed after passing through several convolutional layers.
¡ It fails to capture geometric or structural patterns that occur consistently in some classes. ¡ For example, dogs are often drawn with realistic fur texture but without clearly
defined separate feet.
62
-
SAGAN
63
Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.
-
SAGAN
64
Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.
¡ Three mapping functions 𝑓(𝑥), 𝑔(𝑥), ℎ(𝑥) and finally a weightedfeature is generated.
¡ Is it similar to something we have seen before?
-
SAGAN
65
Image source: Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." In International Conference on Machine Learning, pp. 7354-7363. PMLR, 2019.
-
PIX2PIX AND CYCLEGAN
66
-
pix2pix
¡ Given a pair of images, transfer the style of one image toanother.
67
Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.
-
pix2pix
¡ The model is based on CGAN.
¡ As an improvement, the generator is tasked to not only fool the discriminator but also to be near the ground truth output.
¡ 𝐿, penalization is added to the loss of CGAN to encourage less blurring:𝐿-' 𝐺 = 𝔼*,/,+ 𝑦 − 𝐺 𝑥, 𝑧 , .
68
Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.
-
pix2pix: More Applications
69
Image source: Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.
-
CycleGAN
¡ Paired examples can be expensive to obtain.
¡ Can we translate from 𝑋 ↔ 𝑌 in an unsupervised manner?
70
Paired vs. unpaired examples
-
CycleGAN
¡ Two generators:¡ 𝐺: 𝑋 → 𝑌;
¡ 𝐹: 𝑌 → 𝑋.
¡ Two discriminators:¡ 𝐷0 aims to distinguish between images {𝑥} and translated images {𝐹(𝑦)};
¡ 𝐷1 aims to discriminate between {𝑦}and {𝐺(𝑥)}.
71Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
-
CycleGAN
¡ If we can go from 𝑋 to >𝑌 via 𝐺, then it should also be possible to go from >𝑌 back to 𝑋 via 𝐹.
¡ Cycle consistency loss is added to the original adversarial loss:
𝐿010 𝐺, 𝐹 = 𝔼2 𝐹 𝐺 𝑥 − 𝑥 3 + 𝔼4 𝐺 𝐹 𝑥 − 𝑦 3 .
72Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
-
CycleGAN
73Image source: Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
Failed case:
-
SRGAN
74
-
SRGAN
¡ Task: Given a low-resolution image, reconstruct the high-resolution image.
¡ Previous work: Residual transpose CNN with MSE loss has shown greatperformance on this task butthe reconstructions areblurry.
75
Bicubic Original
Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.
-
SRGAN
¡ Generator: Similar to ResNet, but uses Transposed Convolutions to increase the spatial dimensions.
¡ Discriminator: Fully-convolutional CNN + MLP Classifier + Binary Cross Entropy.
76Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.
-
SRGAN: VGG Loss
¡ Pixel-wise MSE loss often lacks high-frequency content which results in smooth textures.:
¡ The authors uses VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network
¡ 𝜙(,* is the feature map obtained by the 𝑗-th convolution (after activation) before the 𝑖-th maxpooling layer.
77
Image source: Lecture 4, MSBA434, UCLA
-
SRGAN: Dual Loss
¡ Weighted sum of adversarial loss and VGG activation loss:
78
Image source: Lecture 4, MSBA434, UCLA
-
SRGAN: Results
79Image source: Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken et al. "Photo-realistic single image super-resolution using a generative adversarial network." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690. 2017.
-
General Idea of Using GAN
¡ GAN can be applied to lots of applications, rather than justimages.
¡ We should identify several things to make sure that our taskcan be solved by GAN.¡ What are fake samples?
¡ What are real samples?
¡ How to transform fake samples to real samples (generator)?
¡ How to distinguish fake sample from real samples (discriminator)?
80
-
Conclusion
After this lecture, you should know:
¡ What is the basic idea of GAN?
¡ How generator and discriminator improve each other?
¡ How does transposed convolution work?
¡ How to design application specific loss to train with adversarialloss?
81
-
Suggested Reading
¡ Adversarial Nets Papers
¡ Tips and tricks to make GANs work
¡令人拍案叫绝的Wasserstein GAN
82
https://github.com/zhangqianhui/AdversarialNetsPapershttps://github.com/soumith/ganhackshttps://zhuanlan.zhihu.com/p/25071913
-
Assignment 4
¡ Assignment 4 is released. The deadline is 18:00, 11th December.
83
-
Thank you!
¡ Any question?
¡ Don’t hesitate to send email to me for asking questions anddiscussion. J
84