unsupervised learning: autoencoders - yunsheng...

88
Unsupervised Learning: Autoencoders Yunsheng Bai

Upload: others

Post on 20-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Unsupervised Learning: Autoencoders

Yunsheng Bai

Page 2: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 3: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Introduction to Autoencoders

Page 4: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,
Page 5: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://en.wikipedia.org/wiki/Principal_component_analysis#/media/File:GaussianScatterPCA.svg

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Page 6: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://www.cs.toronto.edu/~urtasun/courses/CSC411/14_pca.pdf

Page 7: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://www.cs.toronto.edu/~urtasun/courses/CSC411/14_pca.pdf

Inner product between them

Change of basis

Page 8: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

PCA ≈ Autoencoder with Linear Activation Function

Not necessarily orthogonal

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systemshttps://www.cs.toronto.edu/~urtasun/courses/CSC411/14_pca.pdf

Page 9: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Could have many layers, but as long as activation is linear → a single W and a single V

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systemshttps://www.cs.toronto.edu/~urtasun/courses/CSC411/14_pca.pdf

PCA ≈ Autoencoder with Linear Activation Function

Page 10: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://towardsdatascience.com/autoencoders-are-essential-in-deep-neural-nets-f0365b2d1d7cHands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

PCA vs Autoencoder— autoencoders are much more

flexible than PCA.

— NN activation functions

introduce “non-linearities”

in encoding, but PCA only does

linear transformation.

— we can stack autoencoders to

form a deep autoencoder

network

Page 11: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

Layer 1 Layer 2 Layer 3 Layer 4

Stacked

Page 12: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Goal: Learn Useful Features from DataWe’ve seen that autoencoders can do PCA, but fundamentally, why does an autoencoder work?

https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694

Page 13: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Goal: Feature/Representation LearningWhy can’t an autoencoder simply copy input to output through identity functions?

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

Overcomplete

min ||x-g(f(x))||2

f g

Page 14: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

To Achieve Feature Learning, Conflicting GoalsAutoencoders are designed to be unable to learn to copy perfectly. Usually they are restricted in ways that allow them to copy only approximately. Because the model is forced to prioritize which aspects of the input should be copied, it often learns useful properties of the data.

Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

Page 15: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Undercomplete Autoencoders

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Pythonhttp://rgraphgallery.blogspot.com/2013/04/rg-3d-scatter-plots-with-vertical-lines.html

hEncoders and decoders are too powerful :(

“If you could speak only a few words per month, you would probably try to make them worth listening to.”

Page 16: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Regularized AutoencodersRegularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output. These other properties include sparsity of the representation, smallness of the derivative of the representation, and robustness to noise or to missing inputs. A regularized autoencoder can be nonlinear and overcomplete but still learn something useful about the data distribution, even if the model capacity is great enough to learn a trivial identity function.

→ introduce new things to the loss

→ they are just different regularizers

2008: Sparse Autoencoders (SAE)

2008: Denoising Autoencoders (DAE)

2011: Contractive Autoencoders (CAE)

2011: Stacked Convolutional Autoencoders (SCAE)

2011: Recursive Autoencoders (RAE)

2013: Variational Autoencoders (VAE)

2015: Adversarial Autoencoders (AAE)

2017: Wasserstein Autoencoders (WAE)

Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

Page 17: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Properties of Autoencoders (Ideally)

1. Learn useful features from data (effective representations)a. Capture the intrinsic properties of data → feed them into downstream applicationsb. Can be thought of as patterns in data → generate new data

2. Produce low-dimensional vectors (efficient/compact representations)a. Efficient for storageb. Efficient for downstream modelsc. May be free of noise in inputd. Easier to visualize than high-dimensional data

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Page 18: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Properties of Autoencoders (Ideally)3. Are flexible: Can be modified/guided/regularized in various ways:

a. Input data, e.g. add noiseb. Output data, e.g. something different from the inputc. Architecture, e.g. fully connected layer → convolutional layerd. Loss, e.g. add additional loss terms → capture other useful information from inpute. Latent space, e.g. Gaussian (more later in VAE)

i. Enforce certain prior knowledge, usually through additional loss termsii. Analyzing the latent space/representations is a trend (?), e.g. debiasing word

embeddingsf. … (Be creative! This is where research comes from)

Page 19: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

History of Autoencoders10 years ago, we thought that deep nets would also need an unsupervised cost, like the autoencoder cost, to regularize them.

Today, we know we are able to recognize images just by using backprop on the supervised cost as long as there is enough labeled data.

(Humans can learn from very few labeled examples. Why? One popular hypothesis: Brain can leverage unsupervised or semi-supervised learning.)

There are other tasks where we do still use autoencoders, but they’re not the fundamental solution to training deep nets that people once thought they were going to be.

(Ian Goodfellow, 2016)https://www.quora.com/Why-are-autoencoders-considered-a-failure-What-are-their-alternativeshttps://www.doc.ic.ac.uk/~js4416/163/website/nlp/#XGlorot2011Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

Page 20: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Applications of Autoencoders1. Data Compression for Storage

a. Difficult to train an autoencoder better than a basic algorithm like JPEGb. Autoencoders are data-specific: may be hard to generalize to unseen data

2. Dimensionality Reduction for Data Visualizationa. t-SNE is good, but typically requires relatively low-dimensional data

i. For high-dimensional data, first use autoencode, then use t-SNEb. Latent space visualization (more later)

https://blog.keras.io/building-autoencoders-in-keras.htmlhttps://www.doc.ic.ac.uk/~js4416/163/website/nlp/#XVincent2008https://hackernoon.com/latent-space-visualization-deep-learning-bits-2-bd09a46920df

Page 21: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Applications of Autoencoders3. Unsupervised Pretraining

a. Greedy Layer-Wise Unsupervised Pretraining: Train each layer of feedforward net greedily; continue stacking layers; output of prior layers is input for the next one; fine tune

b. Today, we have random weight initialization, rectified linear units (ReLUs) (2011), dropout (2012), batch normalization (2014), residual learning (2015) + large labeled datasets

c. Still usefuli. Train a deep autoencoderii. Train an autoencoder on an unlabeled dataset, and reuse the lower layers to create a

new network trained on the labeled data (~supervised pretraining) iii. Train an autoencoder on an unlabeled dataset, and use the learned representations in

downstream tasks (see more in 4)

https://blog.keras.io/building-autoencoders-in-keras.htmlhttps://www.doc.ic.ac.uk/~js4416/163/website/nlp/#XVincent2008

Page 22: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Greedy Layer-Wise Unsupervised Pretrainingfor Training Deep Autoencoders

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Page 23: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Unsupervised Pretraining for Supervised Tasks

Page 24: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Unsupervised Pretraining for Supervised Tasks

downside: two-staged → hyperparameters tuning :(

Page 25: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Supervised Pretraining

Page 26: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://www.youtube.com/watch?v=R3DNKE3zKFk

Multi-Task Learning

Transfer Learning Domain Adaptation

supervised pretraining

Page 27: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

https://www.youtube.com/watch?v=R3DNKE3zKFk

Multi-Task Learning

Page 28: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Applications of Autoencoders4. Generate Representations for Downstream Tasks

a. Special case of unsupervised pre-training (3.c.iii)b. Useful when the initial representation is poor, and there is a lot of unlabeled data

i. Word embeddings (better than one-hot representations)ii. Graph node embeddingsiii. Image embeddings (Images already lie in a rich vector space? Check out puppy image

embeddings!)iv. Semantic hashing: turn database entries (text, image, etc.) into low-dimensional and

binary codes → Information retrievalc. Question: If there are labels, is there any reason to use a decoder with a reconstruction loss?

5. Generate New Data (Generative Model)a. Especially, Variational Autoencoders (VAE), Adversarial Autoencoders (AAE) (more later)b. Creative applications (more later)

Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

Page 29: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Or logistic regression SVM, classifier, etc.

using the Hidden 2 as input

Copy output of Layer 2

(embedding)

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

graph node embedding

Page 30: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Copy output of Layer 2(code)

Database Query

Hidden rep.

Compare the codes

Query

in databaseHands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

semantic hashing

Page 31: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Applications of Autoencoders6. Self-supervised Learning

a. ∈ supervised learning where the targets are generated from the input datab. Merely learning to reconstruct the input might not be enough to learn abstract features of the

kind that label-supervised learning induces (where targets are "dog", "car"...)i. Data denoising

ii. Jigsaw puzzle solveriii. ...

https://blog.keras.io/building-autoencoders-in-keras.htmlhttps://www.doc.ic.ac.uk/~js4416/163/website/nlp/#XVincent2008

Page 32: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Skipgram vs Autoencoders1. In NLP word embeddings, why is Skipgram more popular than autoencoders?

a. Simplerb. More efficientc. Works well already

2. When does Skipgram no longer suffice? Additional goals, e.g.a. Denoisingb. Complex characteristics of word use + polysemy → Use bidirectional LSTM with attention

as the encoder!c. Generative setting (generate new data)d. Inductive setting (embed unseen words)

3. Can Skipgram be viewed as a special case of some autoencoder model?a. In fact, encoding and decoding are very general concepts and are used in many places

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).

Page 33: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 34: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Sparse Autoencoders (SAE) (2008)

Page 35: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 1: Sparse Coding

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

An image should be represented by only a few bases.

Page 36: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 1: Sparse Coding

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

A document should be about only a few topics.

Page 37: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 1: Sparse Coding

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

Page 38: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 1: Sparse Coding

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

D x

DT

��

x ��=

=

DTx=

Dh=DDT =x→ DDT=I

Change of basis

+ Sparsity constraint

“If you could speak only a few words per month, you would probably try to make them worth listening to.”

Page 39: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Inner product between them

https://www.cs.toronto.edu/~urtasun/courses/CSC411/14_pca.pdf

Change of basis

Recall PCA

Page 40: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 2: Prevent Identity Transform

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

f g

W x

WT

h

x h=

=

WTx=hf(x)=h

Wh=WWT =x→ WWT=I (fine)g(h)=x

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 10 0 0 0 0 00 0 0 0 0 00 0 0 0 0 0

1 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 0

W ≈ I

Page 41: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 2: Prevent Identity Transform

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

0.20.3 0.1 + ...

Same as input, i.e. x

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 1 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

W

In the case of image, we can think

of W as a set of convolution filters

(each with the same size as the input, e.g. 4x4).

Page 42: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Motivation 2: Prevent Identity Transform

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdf

1000000000000000

0100000000000000

0010000000000000

1 0 1 + ...

Page 43: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Sparse Autoencodersf g

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in PythonHands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systemshttp://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf

# training samples Reconstruction loss Regularization term Sparsity penalty

This results in sparse activation of hidden units across training points, but does not guarantee that each input has a sparse representation. (Makhzani, Alireza, and Brendan Frey. "K-sparse autoencoders." arXiv preprint arXiv:1312.5663 (2013).)

activation of hidden unit j of layer 2 (assume two layers

in encoder)

Page 44: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

Results

Page 45: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Techniques to Interpret Autoencoders1. Visualize the weight matrix W

a. Each column of W corresponds to the weights of a particular neuronb. When there is a natural interpretation of the weights, can visualize them

i. Especially true in the case of image as seen previously (~convolution filters)ii. Especially true for the top hidden layers since they often capture relatively large

features

2. Visualize the most exciting input per neurona. Treat each neuron as a feature detector. To find the feature a particular neuron is looking for,

i. Feed a random inputii. Measure the activation of the neuron you are interested iniii. Perform backpropagation to tweak the input so that the neuron will activate even

more (gradient ascent)iv. Iterate several times

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Page 46: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 47: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Denoising Autoencoders (DAE) (2008)

Page 48: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Sparse Coding Could Also Handle Image DenoisingKey: the use of sparse and redundant representations over trained dictionaries.

https://www.cs.ubc.ca/~schmidtm/MLRG/sparseCoding.pdfElad, Michael, and Michal Aharon. "Image denoising via learned dictionaries and sparse representation." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 1. IEEE, 2006.

Page 49: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Denoising Autoencoders: Implementation-level

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Page 50: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Denoising Autoencoders: Results

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

Gaussian noise

Page 51: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Denoising Autoencoders: Results

Pro Deep Learning with TensorFlow: A Mathematical Approach to Advanced Artificial Intelligence in Python

Salt and pepper noise

Page 52: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Denoising Autoencoders: Research-level

Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

Why equivalent to reconstruction loss

? (1) Intuitively (2) Recall Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model

Page 53: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 54: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Contractive Autoencoders (CAE) (2011)

Page 55: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

CAE: Resist Infinitesimal Perturbations of InputAll autoencoder training procedures involve a compromise between two opposing forces: being data-specific and being data-insensitive.

Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow, Yoshua Bengio, Aaron Courville)

CAE and DAE are equivalent under

certain conditions.

Page 56: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 57: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Stacked Convolutional Autoencoders (SCAE) (2011)

Page 58: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

SCAEUse convolutional + pooling layers instead of fully connected layers.

Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE transactions on pattern analysis and machine intelligence 38.2 (2016): 295-307.

Page 59: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Image Deblurring/Denoising/Super-Resolution and Image Colorization

Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE transactions on pattern analysis and machine intelligence 38.2 (2016): 295-307.https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694

Page 60: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 61: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Recursive Autoencoders (RAE) (2011)

Page 62: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Sentence Representation

Socher, Richard, et al. "Semi-supervised recursive autoencoders for predicting sentiment distributions." Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011.

Why not simple average? “white blood cells destroying an infection” ≠ “an infection destroying white blood cells”

Page 63: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Sentence Representation

https://www.doc.ic.ac.uk/~js4416/163/website/nlp/recursive.htmlSocher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in neural information processing systems. 2011.

Could use parse tree

Could introduce a supervised loss

Could penalize top-level nodes more heavily, which contain more children

Could use many layers

Could normalize the hidden representations

Could predict all children underneath → unfolding RAE

Page 64: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 65: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Variational Autoencoders (VAE) (2013)

Page 66: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

VAE: Intuition

https://www.jeremyjordan.me/variational-autoencoders/

Encoder Outputs Statistical Distributions; Feed Samples into Decoder→ Add Noise at All Times; Generate New Data After Training

Page 67: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

VAE: Implementation-level

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Probabilistic (produce and even after

training) + generativeautoencoders

Assume the prior distribution of z, i.e. p(z)

to be Gaussian → encourage the learned posterior q(z|x) to be

similar to p(z) through an additional loss term measuring their KL

divergence

-

Page 68: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

VAE: Research-level

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).https://www.jeremyjordan.me/variational-autoencoders/

generative model

variational approximation

to the intractable true posterior

probabilistic encoder(recognition model)

probabilistic decoder(generative model)

latent representation or code

Variational Bayesian Inference

Page 69: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

MusicVAE: Generative Model → Creative Artists

https://magenta.tensorflow.org/music-vaeRoberts, Adam, et al. "A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music." arXiv preprint arXiv:1803.05428 (2018).

The desirable properties of a latent space can be summarized as follows:

1. Expression: Any real example can be mapped to some point in the latent space and reconstructed from it.

2. Realism: Any point in this space represents some realistic example, including ones not in the training set.

3. Smoothness: Examples from nearby points in latent space have similar qualities to one another.

https://experiments.withgoogle.com/ai/beat-blender/view/

Page 70: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Key: Design Latent Space Properties

https://www.jeremyjordan.me/variational-autoencoders/

Learn smooth latent state representations of the input data.Good for interpolation, sampling, generation, downstream

classification, etc.

“holes” :(

Page 71: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Interpolation → Smooth Transformation

https://www.jeremyjordan.me/variational-autoencoders/

Page 72: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

SketchRNN: Seq2seq + Variational Autoencoder

https://research.googleblog.com/2017/04/teaching-machines-to-draw.html

sequence-to-sequence (seq2seq) autoencoder framework with variational inferencesketch → sequence of motor actions controlling a pen (how about text or graph as a sequence?)by adding noise to the latent vector, the model cannot reproduce the input sketch exactly

Arithmetic operations on sketch embeddings!

Smoothness of latent space

Page 73: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 74: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Adversarial Autoencoders (AAE) (2015)

Page 75: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

DKL

Prior p(z) Additional

loss term

Makhzani, Alireza, et al. "Adversarial autoencoders." arXiv preprint arXiv:1511.05644 (2015).

AAE: Regularized by An Adversarial Network Which Guides Posterior q(z|x) to Match Any Arbitrary Prior p(z)VAE AAE

Page 76: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

AAE: Design Arbitrary Prior

Makhzani, Alireza, et al. "Adversarial autoencoders." arXiv preprint arXiv:1511.05644 (2015).

Page 77: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

AAE: Labels Can Further Guide (Semi-Supervised)

Makhzani, Alireza, et al. "Adversarial autoencoders." arXiv preprint arXiv:1511.05644 (2015).

Page 78: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 79: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Wasserstein Autoencoders (WAE) (2017)

Page 80: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

WAE: MotivationVAE

Pros:

1. Theoretically elegant2. Stable training3. Encoder-decoder architecture4. Nice latent manifold structure

Cons:

1. Tend to generate blurry samples

GAN

Pros:

1. Good visual quality of images

Cons:

1. Harder to train2. No encoder; only a decoder/generator and

a discriminator3. “Mode collapse” problem 4. ~JS divergence, “worse” than Wasserstein

distance (see details in the paper)

Tolstikhin, Ilya, et al. "Wasserstein Auto-Encoders." arXiv preprint arXiv:1711.01558 (2017).

Page 81: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Combine VAE + GAN in A Principled Way?VAE GAN

DKL

Prior p(z) Additional

loss term

Tolstikhin, Ilya, et al. "Wasserstein Auto-Encoders." arXiv preprint arXiv:1711.01558 (2017).

decoder/generator

discriminator

decoderencoder

Page 82: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

AAE

WAEA generalization of AAE; minimizes Wasserstein distance between the model and the target distribution.

Tolstikhin, Ilya, et al. "Wasserstein Auto-Encoders." arXiv preprint arXiv:1711.01558 (2017).https://openreview.net/forum?id=HkL7n1-0b

Page 83: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Roadmap1. Introduction to Autoencoders2. Sparse Autoencoders (SAE) (2008)3. Denoising Autoencoders (DAE) (2008)4. Contractive Autoencoders (CAE) (2011)5. Stacked Convolutional Autoencoders (SCAE) (2011)6. Recursive Autoencoders (RAE) (2011)7. Variational Autoencoders (VAE) (2013)8. Adversarial Autoencoders (AAE) (2015)9. Wasserstein Autoencoders (WAE) (2017)

10. Autoencoders for Graphs

Page 84: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Autoencoders for Graphs

Page 85: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Graphs Are Different1. Are there smooth linear interpolations? Arithmetic operations?2. Graph is composed of correlated substructures

a. E.g. Two triangles → rectangleb. Hierarchy: pixels (atomic) → patterns → images; words (atomic) → phrases → sentences →

paragraphs/documents; nodes (atomic) → substructures → graphs (transfer learning)

3. Graphs are of different sizes4. Graph nodes lack order5. How to detect substructures?

a. For image, convolutional layers → SCAEb. For graph, graph convolutional layers → node/substructure/graph?c. Some people treat graph as sequences/random walks → “deconstruction” view

i. ~Parse sentences into trees instead of feeding into LSTMd. How about decompose graphs into equal-size subgraphs?

Page 86: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Simonovsky, Martin, and Nikos Komodakis. "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprint arXiv:1802.03480 (2018).

GraphVAE

Page 87: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Simonovsky, Martin, and Nikos Komodakis. "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprint arXiv:1802.03480 (2018).

Page 88: Unsupervised Learning: Autoencoders - Yunsheng Baiyunshengb.com/wp-content/uploads/2018/04/0412018...Deep Learning (Adaptive Computation and Machine Learning series) (Ian Goodfellow,

Thank you!