warm-up as you log in10601/lectures/10601_fa20...wrap-up clustering clustering slides introduction...

56
Warm-up as you log in 1 1. https://www.sporcle.com/games/MrChewypoo/minimalist_disney 2. https://www.sporcle.com/games/Stanford0008/minimalist-cartoons- slideshow 3. https://www.sporcle.com/games/MrChewypoo/minimalist

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Warm-up as you log in

1

1. https://www.sporcle.com/games/MrChewypoo/minimalist_disney

2. https://www.sporcle.com/games/Stanford0008/minimalist-cartoons-slideshow

3. https://www.sporcle.com/games/MrChewypoo/minimalist

Page 2: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

AnnouncementsAssignments

▪ HW9

▪ Due Wed, 12/9, 11:59 pm

▪ The two slip days are free (last possible submission Fri, 12/11, 11:59 pm)

Final Exam

▪ Mon, 12/14

▪ Stay tuned to Piazza for more details

Page 3: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Wrap-up ClusteringClustering slides

Page 4: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Introduction to Machine Learning

Dimensionality Reduction and PCA

Instructor: Pat Virtue

Page 5: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Learning Paradigms

Slid

e c

red

it:

CM

U M

LD, M

att

Go

rmle

y

Page 6: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Outline

Dimensionality Reduction▪ High-dimensional data

▪ Learning (low dimensional) representations

Principal Component Analysis (PCA)▪ Examples: 2D and 3D

▪ PCA algorithm

▪ PCA objective and optimization

▪ PCA, eigenvectors, and eigenvalues

6

Page 7: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Warm-up as you log in

7

1. https://www.sporcle.com/games/MrChewypoo/minimalist_disney

2. https://www.sporcle.com/games/Stanford0008/minimalist-cartoons-slideshow

3. https://www.sporcle.com/games/MrChewypoo/minimalist

Page 8: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction

Page 9: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction

Page 10: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction

Page 11: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction

Page 12: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction

For each 𝑥(𝑖) ∈ ℝ𝑀 find representation 𝑧(𝑖) ∈ ℝ𝐾 where 𝐾 ≪ 𝑀

12

Page 13: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

High Dimension Data

Examples of high dimensional data:

– High resolution images (millions of pixels)

13

Page 14: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reductionhttp://timbaumann.info/svd-image-compression-demo/

https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html

Page 15: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reductionhttp://timbaumann.info/svd-image-compression-demo/

https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html

Page 16: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

High Dimension Data

Examples of high dimensional data:

– Brain Imaging Data (100s of MBs per scan)

16Image from https://pixabay.com/en/brain-mrt-magnetic-resonance-imaging-1728449/

Image from (Wehbe et al., 2014)

Page 17: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA, Kernel PCA, ICA: Powerful unsupervised learning techniques for extracting hidden (potentially lower dimensional) structure from high dimensional datasets.

Learning Representations

Useful for:

• Visualization

• Further processing by machine learning algorithms

• More efficient use of resources (e.g., time, memory, communication)

• Statistical: fewer dimensions → better generalization

• Noise removal (improving data quality)

Slide from Nina Balcan

Page 18: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PRINCIPAL COMPONENT ANALYSIS (PCA)

18

Page 19: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Principal Component Analysis (PCA)

In case where data lies on or near a low d-dimensional linear subspace, axes of this subspace are an effective representation of the data.

Identifying the axes is known as Principal Components Analysis, and can be obtained by using classic matrix computation tools (Eigen or Singular Value Decomposition).

Slide from Nina Balcan

Page 20: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

2D Gaussian dataset

Slide from Barnabas Poczos

Page 21: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

1st PCA axis

Slide from Barnabas Poczos

Page 22: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

2nd PCA axis

Slide from Barnabas Poczos

Page 23: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA Axes

23

Page 24: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Data for PCA

We assume the data is centered

24

Q: What if your data is

not centered?

A: Subtract off the

sample mean

Slide from Matt Gormley

Page 25: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Sample Covariance Matrix

The sample covariance matrix is given by:

25

Since the data matrix is centered, we rewrite as:

Slide from Matt Gormley

Page 26: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA AlgorithmInput: 𝑿, 𝑿𝑡𝑒𝑠𝑡, 𝐾

1. Center data (and scale each axis) based on training data → 𝑿,𝑿𝒕𝒆𝒔𝒕

2. 𝑽 = eigenvectors(𝑿𝑇𝑿)

3. Keep only the top 𝐾 eigenvectors: 𝑽𝐾4. 𝐙test = 𝑿𝑡𝑒𝑠𝑡𝑽𝐾

Optionally, use 𝑽𝐾𝑇 to rotate 𝐙test back to original subspace 𝐗′test and

uncenter

Page 27: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

27GLBC – MSK Image Analysis

April 23, 2010

Growth Plate ImagingGrowth Plate Disruption and Limb Length Discrepancy

Images Courtesy H. Potter, H.S.S.

8 year-old boy with previous fracture and 4cm leg length discrepancy

Page 28: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

28GLBC – MSK Image Analysis

April 23, 2010

Growth Plate ImagingGrowth Plate Disruption and Limb Length Discrepancy

Images Courtesy H. Potter, H.S.S.

8 year-old boy with previous fracture and 4cm leg length discrepancy

Page 29: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

29GLBC – MSK Image Analysis

April 23, 2010

Growth Plate ImagingArea Measurement

Page 30: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

30GLBC – MSK Image Analysis

April 23, 2010

Growth Plate ImagingArea Measurement

Flatten Growth Plate to Enable 2D Area Measurement

Page 31: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Piazza Poll 1What is the projection of point 𝒙 onto vector 𝒗, assuming that ‖𝒗‖2 = 1?

A. 𝒗𝒙

B. 𝒗𝑇𝒙

C. 𝒗𝑇𝒙 𝒗

D. 𝒗𝑇𝒙 𝒙𝑇𝒗

Page 32: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Rotation of Data (and back)1. For any orthogonal matrix 𝑽 ∈ ℝ𝑀×𝑀

2. Rotate to new space: 𝒛(𝑖) = 𝑽𝒙(𝑖) ∀𝑖

3. (Un)rotate back: 𝒙′(𝑖) = 𝑽𝑇𝒛(𝑖)

Page 33: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA AlgorithmInput: 𝑿, 𝑿𝑡𝑒𝑠𝑡, 𝐾

1. Center data (and scale each axis) based on training data → 𝑿,𝑿𝒕𝒆𝒔𝒕

2. 𝑽 = eigenvectors(𝑿𝑇𝑿)

3. Keep only the top 𝐾 eigenvectors: 𝑽𝐾4. 𝐙test = 𝑿𝑡𝑒𝑠𝑡𝑽𝐾

Optionally, use 𝑽𝐾𝑇 to rotate 𝐙test back to original subspace 𝐗′test and

uncenter

Page 34: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Outline

Dimensionality Reduction▪ High-dimensional data

▪ Learning (low dimensional) representations

Principal Component Analysis (PCA)▪ Examples: 2D and 3D

▪ PCA algorithm

▪ PCA objective and optimization

▪ PCA, eigenvectors, and eigenvalues

34

Page 35: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Sketch of PCA1. Select “best” 𝑽 ∈ ℝ𝐾×𝑀

2. Project down: 𝒛(𝑖) = 𝑽𝒙(𝑖) ∀𝑖

3. Reconstruct up: 𝒙′(𝑖) = 𝑽𝑇𝒛(𝑖)

Page 36: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Sketch of PCA1. Select “best” 𝑽 ∈ ℝ𝐾×𝑀

2. Project down: 𝒛(𝑖) = 𝑽𝒙(𝑖) ∀𝑖

3. Reconstruct up: 𝒙′(𝑖) = 𝑽𝑇𝒛(𝑖)

Definition of PCA

1. Select 𝑣1 that best explains data

2. Select next 𝑣𝑗 that

i. Is orthogonal to 𝑣1, … , 𝑣𝑗−1ii. Best explains remaining data

3. Repeat 2 until desired amount of data is explained

Page 37: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Select “Best” VectorReconstruction Error vs Variance of Projection

Page 38: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Piazza Poll 2 & Poll 3Consider the two projections below

Poll 2: Which maximizes the variance?

Poll 3: Which minimizes the reconstruction error?

Option A Option B

Page 39: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Select “Best” VectorReconstruction Error vs Variance of Projection

Page 40: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA

40

Equivalence of Maximizing Variance and Minimizing Reconstruction Error

Page 41: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Sketch of PCA1. Select “best” 𝑽 ∈ ℝ𝐾×𝑀

2. Project down: 𝒛(𝑖) = 𝑽𝒙(𝑖) ∀𝑖

3. Reconstruct up: 𝒙′(𝑖) = 𝑽𝑇𝒛(𝑖)

Definition of PCA

1. Select 𝑣1 that best explains data

2. Select next 𝑣𝑗 that

i. Is orthogonal to 𝑣1, … , 𝑣𝑗−1ii. Best explains remaining data

3. Repeat 2 until desired amount of data is explained

Page 42: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA: the First Principal Component

44

Page 43: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

SVD for PCASVD matrix factorization

𝑿 = 𝑼𝑺𝑽𝑇 , 𝐴 ∈ ℝ𝑁×𝑀

𝑼: 𝑁 × N orthogonal matrix

▪ Columns of 𝑼 are left singular vectors of 𝑿

▪ Columns of 𝑼 are eigenvectors of 𝑿𝑿𝑇

𝑽: 𝑀 ×𝑀 orthogonal matrix

▪ Columns of 𝑽 are right singular vectors of 𝑿

▪ Columns of 𝑽 are eigenvectors of 𝑿𝑇𝑿

𝑺: 𝑁 ×𝑀 diagonal matrix

▪ Diagonal entries are singular values of 𝑿, 𝜎𝑘

▪ Each 𝜎𝑘2 are the eigenvalues of both 𝑿𝑿𝑇 and 𝑿𝑇𝑿!!

Page 44: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA AlgorithmInput: 𝑿, 𝑿𝑡𝑒𝑠𝑡, 𝐾

1. Center data (and scale each axis) based on training data → 𝑿,𝑿𝒕𝒆𝒔𝒕

2. 𝑽 = eigenvectors(𝑿𝑇𝑿)

3. Keep only the top 𝐾 eigenvectors: 𝑽𝐾4. 𝐙test = 𝑿𝑡𝑒𝑠𝑡𝑽𝐾

Optionally, use 𝑽𝐾𝑇 to rotate 𝐙test back to original subspace 𝐗′test and

uncenter

Page 45: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Principal Component Analysis (PCA)

XTX v = λv , so v (the first PC) is the eigenvector of

sample correlation/covariance matrix 𝑋𝑇𝑋

Sample variance of projection v𝑇𝑋𝑇𝑋 v = 𝜆v𝑇v = 𝜆

Thus, the eigenvalue 𝜆 denotes the amount of variability captured along that dimension (aka amount of energy along

that dimension).

Eigenvalues 𝜆1 ≥ 𝜆2 ≥ 𝜆3 ≥ ⋯

• The 1st PC 𝑣1 is the eigenvector of the sample covariance matrix𝑋𝑇𝑋 associated with the largest eigenvalue

• The 2nd PC 𝑣2 is the eigenvector of the sample covariance matrix𝑋𝑇𝑋 associated with the second largest eigenvalue

• And so on …

Slide from Nina Balcan

Page 46: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

• For M original dimensions, sample covariance matrix is MxM, and has up to M eigenvectors. So M PCs.

• Where does dimensionality reduction come from?Can ignore the components of lesser significance.

0

5

10

15

20

25

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

Var

ian

ce (

%)

How Many PCs?

© Eric Xing @ CMU, 2006-2011 53

• You do lose some information, but if the eigenvalues are small, you don’t lose much

– M dimensions in original data – calculate M eigenvectors and eigenvalues– choose only the first D eigenvectors, based on their eigenvalues– final data set has only D dimensions

Variance (%) = ratio of variance along given principal component to total

variance of all principal components

Page 47: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

PCA EXAMPLES

54

Page 48: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Projecting MNIST digits

55

Task Setting:

1. Take 28x28 images of digits and project them down to K components

2. Report percent of variance explained for K components

3. Then project back up to 28x28 image to visualize how much information was preserved

Page 49: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Projecting MNIST digits

56

Task Setting:1. Take 28x28 images of digits and project them down to 2 components2. Plot the 2 dimensional points

Page 50: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Projecting MNIST digits

57

Task Setting:1. Take 28x28 images of digits and project them down to 2 components2. Plot the 2 dimensional points

Page 51: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction with Deep LearningHinton, Geoffrey E., and Ruslan R. Salakhutdinov.

"Reducing the dimensionality of data with neural networks.”

Science 313.5786 (2006): 504-507.

Page 52: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

Dimensionality Reduction with Deep LearningHinton, Geoffrey E., and Ruslan R. Salakhutdinov.

"Reducing the dimensionality of data with neural networks.”

Science 313.5786 (2006): 504-507.

PCANeural Network

Page 53: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

A Huge Thanks to the Course Team!Education Associates

Page 54: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

A Huge Thanks to the Course Team! TeamTeaching Assistants

Page 55: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

A Huge Thanks to the Course Team! TeamTeaching Assistants

Page 56: Warm-up as you log in10601/lectures/10601_Fa20...Wrap-up Clustering Clustering slides Introduction to Machine Learning Dimensionality Reduction and PCA Instructor: Pat Virtue Learning

A Huge Thanks to the Course Team! TeamStudents!!