carnegie%mellon%university em pca
TRANSCRIPT
![Page 1: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/1.jpg)
EM+
PCA
1
10-‐601 Introduction to Machine Learning
Matt GormleyLecture 17
March 22, 2017
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
EM, GMM Readings:Murphy 11.4.1, 11.4.2, 11.4.4Bishop 9HTF 8.5 -‐ 8.5.3Mitchell 6.12 -‐ 6.12.2
PCA Readings:Murphy 12Bishop 12HTF 14.5Mitchell -‐-‐
![Page 2: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/2.jpg)
Reminders
• Homework 5: Readings / Application of ML– Release: Wed, Mar. 08– Due: Wed, Mar. 22 at 11:59pm
2
![Page 3: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/3.jpg)
EM AND GMMS
3
![Page 4: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/4.jpg)
Expectation-‐Maximization Outline• Background
– Multivariate Gaussian Distribution– Marginal Probabilities
• Building up to GMMs– Distinction #1: Model vs. Objective Function– Gaussian Naïve Bayes (GNB)– Gaussian Discriminant Analysis– Gaussian Mixture Model (GMM)
• Expectation-‐Maximization– Distinction #2: Complete Data Likelihood vs.
Marginal Data Likelihood– Distinction #3: Latent Variables vs. Parameters– Objective Functions for EM– EM Algorithm– EM for GMM vs. K-‐Means
• Properties of EM– Nonconvexity / Local Optimization– Example: Grammar Induction– Variants of EM
4
This Lecture
Last Lecture
![Page 5: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/5.jpg)
Background
Whiteboard–Multivariate Gaussian Distribution–Marginal Probabilities
6
![Page 6: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/6.jpg)
GAUSSIAN MIXTURE MODEL (GMM)
7
![Page 7: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/7.jpg)
Building up to GMMs
Whiteboard– Distinction #1: Model vs. Objective Function– Gaussian Naïve Bayes (GNB)– Gaussian Discriminant Analysis– Gaussian Mixture Model (GMM)
8
![Page 8: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/8.jpg)
Gaussian Discriminant Analysis
9
Data:
Model: Joint:
Log-‐likelihood:
Generative Story: z � Categorical(�)
� Gaussian(µz,�z)
p( , z; �, µ,�) = p( |z; µ,�)p(z; �)
D = {( (i), (i))}Ni=1 where
(i) � RM and z(i) � {1, . . . , K}
�(�, µ,�) =N�
i=1
p( (i), z(i); �, µ,�)
=N�
i=1
p( (i)|z(i); µ,�) + p(z(i); �)
![Page 9: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/9.jpg)
Gaussian Discriminant Analysis
10
Data:
Log-‐likelihood:
D = {( (i), (i))}Ni=1 where
(i) � RM and z(i) � {1, . . . , K}
Maximum Likelihood Estimates:Take the derivative of the Lagrangian, set it equal to zero and solve.
�k =1
N
N�
i=1
I(z(i) = k), �k
µk =
�Ni=1 I(z(i) = k) (i)
�Ni=1 I(z(i) = k)
, �k
�k =
�Ni=1 I(z(i) = k)( (i) � µk)( (i) � µk)T
�Ni=1 I(z(i) = k)
, �k
�(�, µ,�) =N�
i=1
p( (i)|z(i); µ,�) + p(z(i); �)
Implementation: Just counting
![Page 10: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/10.jpg)
Gaussian Mixture-‐Model
11
Data:
Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:
D = { (i)}Ni=1 where (i) � RM
Model: Joint:
Marginal:
(Marginal) Log-‐likelihood:
Generative Story: z � Categorical(�)
� Gaussian(µz,�z)
p( ; �, µ,�) =K�
z=1
p( |z; µ,�)p(z; �)
p( , z; �, µ,�) = p( |z; µ,�)p(z; �)
�(�, µ,�) =N�
i=1
p( (i); �, µ,�)
=N�
i=1
K�
z=1
p( (i)|z; µ,�)p(z; �)
![Page 11: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/11.jpg)
Mixture-‐Model
12
Data:
Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:
D = { (i)}Ni=1 where (i) � RM
Model: p�,�( , z) = p�( |z)p�(z)
p�,�( ) =K�
z=1
p�( |z)p�(z)
Joint:
Marginal:
(Marginal) Log-‐likelihood:�(�) =
N�
i=1
p�,�( (i))
=N�
i=1
K�
z=1
p�( (i)|z)p�(z)
Generative Story: z � Multinomial(�)
� p�(·|z)
z � Categorical(�)
� Gaussian(µz,�z)
![Page 12: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/12.jpg)
Mixture-‐Model
13
Data:
Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:
D = { (i)}Ni=1 where (i) � RM
Model: p�,�( , z) = p�( |z)p�(z)
p�,�( ) =K�
z=1
p�( |z)p�(z)
Joint:
Marginal:
(Marginal) Log-‐likelihood:�(�) =
N�
i=1
p�,�( (i))
=N�
i=1
K�
z=1
p�( (i)|z)p�(z)
Generative Story: z � Multinomial(�)
� p�(·|z)
z � Categorical(�)
� Gaussian(µz,�z)This could be any arbitrary distribution parameterized by θ.
Today we’re thinking about the case where it is a Multivariate Gaussian.
![Page 13: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/13.jpg)
Unsupervised Learning: Parameters are coupled by marginalization.
Supervised Learning: The parameters decouple!
Learning a Mixture Model
14
X1 XMX2
Z
…
��, �� =�,�
N�
i=1
p�( (i)|z(i))p�(z(i))
�� =�
N�
i=1
p�( (i)|z(i))
�� =�
N�
i=1
p�(z(i))
X1 XMX2
Z
…
D = { (i)}Ni=1
��, �� =�,�
N�
i=1
K�
z=1
p�( (i)|z)p�(z)
D = {( (i), (i))}Ni=1
![Page 14: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/14.jpg)
Unsupervised Learning: Parameters are coupled by marginalization.
Supervised Learning: The parameters decouple!
Learning a Mixture Model
15
X1 XMX2
Z
…
��, �� =�,�
N�
i=1
p�( (i)|z(i))p�(z(i))
�� =�
N�
i=1
p�( (i)|z(i))
�� =�
N�
i=1
p�(z(i))
X1 XMX2
Z
…
D = { (i)}Ni=1
��, �� =�,�
N�
i=1
K�
z=1
p�( (i)|z)p�(z)
Training certainly isn’t as simple as the supervised case.
In many cases, we could still use some black-‐box optimization method (e.g. Newton-‐Raphson) to solve this coupledoptimization problem.
This lecture is about a more problem-‐specific method: EM.
D = {( (i), (i))}Ni=1
![Page 15: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/15.jpg)
EXPECTATION MAXIMIZATION
16
![Page 16: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/16.jpg)
Expectation-‐Maximization
Whiteboard– Distinction #2: Complete Data Likelihood vs. Marginal Data Likelihood
– Distinction #3: Latent Variables vs. Parameters– Objective Functions for EM– EM Algorithm– EM for GMM vs. K-‐Means
17
![Page 17: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/17.jpg)
EXAMPLE: K-‐MEANS VS GMM
24
![Page 18: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/18.jpg)
Example: K-‐Means
25
![Page 19: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/19.jpg)
Example: K-‐Means
26
![Page 20: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/20.jpg)
Example: K-‐Means
27
![Page 21: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/21.jpg)
Example: K-‐Means
28
![Page 22: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/22.jpg)
Example: K-‐Means
29
![Page 23: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/23.jpg)
Example: K-‐Means
30
![Page 24: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/24.jpg)
Example: K-‐Means
31
![Page 25: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/25.jpg)
Example: K-‐Means
32
![Page 26: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/26.jpg)
Example: GMM
33
![Page 27: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/27.jpg)
Example: GMM
34
![Page 28: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/28.jpg)
Example: GMM
35
![Page 29: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/29.jpg)
Example: GMM
36
![Page 30: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/30.jpg)
Example: GMM
37
![Page 31: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/31.jpg)
Example: GMM
38
![Page 32: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/32.jpg)
Example: GMM
39
![Page 33: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/33.jpg)
Example: GMM
40
![Page 34: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/34.jpg)
Example: GMM
41
![Page 35: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/35.jpg)
Example: GMM
42
![Page 36: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/36.jpg)
Example: GMM
43
![Page 37: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/37.jpg)
Example: GMM
44
![Page 38: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/38.jpg)
Example: GMM
45
![Page 39: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/39.jpg)
Example: GMM
46
![Page 40: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/40.jpg)
Example: GMM
47
![Page 41: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/41.jpg)
Example: GMM
48
![Page 42: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/42.jpg)
Example: GMM
49
![Page 43: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/43.jpg)
Example: GMM
50
![Page 44: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/44.jpg)
Example: GMM
51
![Page 45: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/45.jpg)
Example: GMM
52
![Page 46: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/46.jpg)
Example: GMM
53
![Page 47: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/47.jpg)
Example: GMM
54
![Page 48: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/48.jpg)
K-‐Means vs. GMMConvergence: K-‐Means tends to convergemuch faster than EM for GMM
Speed: Each iteration of K-‐Means is computationally less intensive than each iteration of EM for GMM
Initialization: To initialize EM for GMM, we typically first run K-‐Means and use the resulting cluster centers as the means of the Gaussian components
Output: A GMM gives a probability distribution over the cluster assignment for each point; whereas K-‐Means gives a single hard assignment
55
![Page 49: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/49.jpg)
PROPERTIES OF EM
56
![Page 50: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/50.jpg)
Properties of EM• EM is trying to optimize a nonconvex function
• But EM is a localoptimization algorithm
• Typical solution: Random Restarts– Just like K-‐Means, we run the algorithm many times
– Each time initialize parameters randomly
– Pick the parameters that give highest likelihood
57
![Page 51: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/51.jpg)
Example: Grammar InductionQuestion: Can maximizing (unsupervised) marginal likelihood produce useful results?
Answer: Let’s look at an example…• Babies learn the syntax of their native language (e.g.
English) just by hearingmany sentences• Can a computer similarly learn syntax of a human
language just by looking at lots of example sentences?– This is the problem of Grammar Induction!– It’s an unsupervised learning problem– We try to recover the syntactic structure for each
sentence without any supervision
58
![Page 52: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/52.jpg)
Example: Grammar Induction
59
3 Alice saw Bob on a hill with a telescope
Alice saw Bob on a hill with a telescope
4 time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
2
3 Alice saw Bob on a hill with a telescope
Alice saw Bob on a hill with a telescope
4 time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
2
3 Alice saw Bob on a hill with a telescope
Alice saw Bob on a hill with a telescope
4 time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
2
3 Alice saw Bob on a hill with a telescope
Alice saw Bob on a hill with a telescope
4 time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
time flies like an arrow
2
No semantic interpretation
…
![Page 53: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/53.jpg)
Example: Grammar Induction
60
real likeflies soupSample 2:
time likeflies an arrowSample 1:
with youtime will seeSample 4:
flies withfly their wingsSample 3:
Training Data: Sentences only, without parses
x(1)
x(2)
x(3)
x(4)
Test Data: Sentences with parses, so we can evaluate accuracy
![Page 54: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/54.jpg)
Example: Grammar Induction
61
lti
-20.2 -20 -19.8 -19.6 -19.4 -19.2 -1910
20
30
40
50
60
Att
achm
ent
Acc
urac
y (%
)
Log-Likelihood (per sentence)4
Pearson’s r = 0.63(strong correlation)
Dependency Model with Valence (Klein & Manning, 2004)
Figure from Gimpel & Smith (NAACL 2012) -‐ slides
Q: Does likelihood correlate with accuracy on a task we care about?
A: Yes, but there is still a wide range of accuracies for a particular likelihood value
![Page 55: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/55.jpg)
Variants of EM• Generalized EM: Replace the M-‐Step by a single gradient-‐step that improves the likelihood
• Monte Carlo EM: Approximate the E-‐Step by sampling
• Sparse EM: Keep an “active list” of points (updated occasionally) from which we estimate the expected counts in the E-‐Step
• Incremental EM / Stepwise EM: If standard EM is described as a batch algorithm, these are the online equivalent
• etc.63
![Page 56: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/56.jpg)
Eric Xing
A Report Card for EM• Some good things about EM:– no learning rate (step-‐size) parameter– automatically enforces parameter constraints– very fast for low dimensions– each iteration guaranteed to improve likelihood
• Some bad things about EM:– can get stuck in local minima– can be slower than conjugate gradient (especially near convergence)
– requires expensive inference step– is a maximum likelihood/MAP method
64© Eric Xing @ CMU, 2006-‐2011
![Page 57: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/57.jpg)
DIMENSIONALITY REDUCTION
65
![Page 58: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/58.jpg)
PCA Outline• Dimensionality Reduction– High-‐dimensional data– Learning (low dimensional) representations
• Principal Component Analysis (PCA)– Examples: 2D and 3D– Data for PCA– PCA Definition– Objective functions for PCA– PCA, Eigenvectors, and Eigenvalues– Algorithms for finding Eigenvectors / Eigenvalues
• PCA Examples– Face Recognition– Image Compression
66
![Page 59: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/59.jpg)
• High-‐Dimensions = Lot of Features
Document classificationFeatures per document = thousands of words/unigramsmillions of bigrams, contextual information
Surveys -‐ Netflix480189 users x 17770 movies
Big & High-‐Dimensional Data
Slide from Nina Balcan
![Page 60: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/60.jpg)
• High-‐Dimensions = Lot of Features
MEG Brain Imaging120 locations x 500 time points x 20 objects
Big & High-‐Dimensional Data
Or any high-‐dimensional image data
Slide from Nina Balcan
![Page 61: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/61.jpg)
• Useful to learn lower dimensional representations of the data.
• Big & High-‐Dimensional Data.
Slide from Nina Balcan
![Page 62: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/62.jpg)
PCA, Kernel PCA, ICA: Powerful unsupervised learning techniques for extracting hidden (potentially lower dimensional) structure from high dimensional datasets.
Learning Representations
Useful for:
• Visualization
• Further processing by machine learning algorithms
• More efficient use of resources (e.g., time, memory, communication)
• Statistical: fewer dimensions à better generalization
• Noise removal (improving data quality)
Slide from Nina Balcan
![Page 63: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/63.jpg)
PRINCIPAL COMPONENT ANALYSIS (PCA)
73
![Page 64: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/64.jpg)
PCA Outline• Dimensionality Reduction– High-‐dimensional data– Learning (low dimensional) representations
• Principal Component Analysis (PCA)– Examples: 2D and 3D– Data for PCA– PCA Definition– Objective functions for PCA– PCA, Eigenvectors, and Eigenvalues– Algorithms for finding Eigenvectors / Eigenvalues
• PCA Examples– Face Recognition– Image Compression
74
![Page 65: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/65.jpg)
Principal Component Analysis (PCA)
In case where data lies on or near a low d-‐dimensional linear subspace, axes of this subspace are an effective representation of the data.
Identifying the axes is known as Principal Components Analysis, and can be obtained by using classic matrix computation tools (Eigen or Singular Value Decomposition).
Slide from Nina Balcan
![Page 66: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/66.jpg)
2D Gaussian dataset
Slide from Barnabas Poczos
![Page 67: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/67.jpg)
1st PCA axis
Slide from Barnabas Poczos
![Page 68: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/68.jpg)
2nd PCA axis
Slide from Barnabas Poczos
![Page 69: Carnegie%Mellon%University EM PCA](https://reader036.vdocuments.mx/reader036/viewer/2022062303/62a5f15b5f11933a3a343855/html5/thumbnails/69.jpg)
Principal Component Analysis (PCA)
Whiteboard– Data for PCA– PCA Definition
79