review of deep learning book, chapter 5, sec. 5.8-5.9 ... · chapter 5, sec. 5.8-5.9 [1] ana karen...

16
Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the Lab, Fall 2019 October 22, 2019 Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1] October 22, 2019 1 / 16

Upload: others

Post on 13-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Review of ‘Deep Learning’ book,Chapter 5, Sec. 5.8-5.9 [1]

Ana Karen Roldan

Department of Mathematics and StatisticsUniversity of Calgary

Lunch at the Lab, Fall 2019

October 22, 2019

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 1 / 16

Page 2: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Introduction

This series of presentations have the purpose to present a review ofthe book Deep Learning.

Chapter 5 talks about the most popular and general techniques usedin Machine Learning.

A machine learning algorithm is an algorithm that is able to learnfrom data. But what do we mean by learning? “A computer programis said to learn from experience E with respect to some class of tasksT and performance measure P , if its performance at tasks in T , asmeasured by P , improves with experience E.”

Section 5.8 and 5.9 deals with the experience E

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 2 / 16

Page 3: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Outline

1 5.8 Unsupervised Learning AlgorithmsDefinitionPrincipal Components Analysis (PCA)k-means Clustering

2 5.9 Stochastic Gradient DescentDefinitionSGD Example

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 3 / 16

Page 4: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Definition

Recall from previous section, supervised learning algorithms learn toassociate some input with some output. E.g. Linear regression.

Unsupervised algorithms are those that experience only ”features”(input) but not a supervision signal (output).

For example, attempts to extract information from a distribution,density estimation, learning to draw samples from a distribution andclustering the data into groups.

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 4 / 16

Page 5: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Simple representation

Those algorithms are looking for a simpler representation of the data, themust common and not mutually exclusive are the following:

Lower dimensional representations (latent variables)

Sparse representations (increments the dimension)

Independent representations (disentangle variation)

Figure: Dimensionality reductionFigure: Sparse Matrix

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 5 / 16

Page 6: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Outline

1 5.8 Unsupervised Learning AlgorithmsDefinitionPrincipal Components Analysis (PCA)k-means Clustering

2 5.9 Stochastic Gradient DescentDefinitionSGD Example

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 6 / 16

Page 7: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Principal Components Analysis (PCA)

PCA is an unsupervised learning algorithm that learns a representation ofdata based on two of the criteria of simple representation: Lower dimen-sionality and Independent representation.

Figure: PCA representation

PCA is a linear projection that aligns the direction of greatest variance withthe axes of the new space.

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 7 / 16

Page 8: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Principal Components Analysis (PCA)

PCA learns an orthogonal, linear transformation of the data that projectsan input x to a representation z = x

TW , where W describes the rotation

of the input space.

Getting the transformation z involves the following steps:

Construct the covariance matrix of the data.

Compute the eigenvectors of this matrix.

Eigenvectors corresponding to the largest eigenvalues are used toreconstruct a large fraction of variance of the original data.

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 8 / 16

Page 9: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Outline

1 5.8 Unsupervised Learning AlgorithmsDefinitionPrincipal Components Analysis (PCA)k-means Clustering

2 5.9 Stochastic Gradient DescentDefinitionSGD Example

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 9 / 16

Page 10: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

k-means Clusteringk-means clustering is another example of a simple representation learningalgorithm that divides the training set into k different clusters of examplesthat are near each other.

This is an example of sparse representation. If x belongs to cluster i , thenhi = 1 and all other entries of the representation h are zero.

Figure: k-means Clustering

The k-means algorithm works by initializing k different centroids, where iis the index of nearest centroid µ(i)

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 10 / 16

Page 11: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Outline

1 5.8 Unsupervised Learning AlgorithmsDefinitionPrincipal Components Analysis (PCA)k-means Clustering

2 5.9 Stochastic Gradient DescentDefinitionSGD Example

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 11 / 16

Page 12: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

DefinitionNearly all of deep learning is powered by one very importantalgorithm: stochastic gradient descent or SGD.

A recurring problem in machine learning is that large training sets arenecessary for good generalization, but large training sets are also morecomputationally expensive.

The main purpose is to minimize the cost function used by a machinelearning. We can sample a minibatch of examples drawn uniformlyfrom the training set.

Figure: Gradient descent algorithm

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 12 / 16

Page 13: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Outline

1 5.8 Unsupervised Learning AlgorithmsDefinitionPrincipal Components Analysis (PCA)k-means Clustering

2 5.9 Stochastic Gradient DescentDefinitionSGD Example

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 13 / 16

Page 14: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

SGD ExampleFor example, the negative conditional log-likelihood of the training data,L(x , y , θ) = −log p(y |x ; θ) can be written as

J(θ) = Ex ,y∼p̂dataL(x , y , θ) =1

m

m∑i=1

L(x (i), y (i), θ)

The gradient descent requires the following computing

OθJ(θ) =1

m

m∑i=1

OθL(x (i), y (i), θ)

The minibatch size m′ is typically chosen to be relatively small.

g =1

m′

m′∑i=1

OθL(x (i), y (i), θ)

Finally, the stochastic gradient descent algorithm follows the estimated gra-dient downhill: θ ← θ − ε ∗ g , where ε is the learning rate.

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 14 / 16

Page 15: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

Thank you!

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 15 / 16

Page 16: Review of Deep Learning book, Chapter 5, Sec. 5.8-5.9 ... · Chapter 5, Sec. 5.8-5.9 [1] Ana Karen Roldan Department of Mathematics and Statistics University of Calgary Lunch at the

References

[1] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. The MITPress, 2016.

Ana Karen Roldan Review of ‘Deep Learning’ book, Chapter 5, Sec. 5.8-5.9 [1]October 22, 2019 16 / 16