object orie’d data analysis, last time finished algebra review multivariate probability review pca...
TRANSCRIPT
Object Orie’d Data Analysis, Last Time
• Finished Algebra Review
• Multivariate Probability Review
• PCA as an Optimization Problem
(Eigen-decomp. gives rotation, easy sol’n)
• Connected Mathematics & Graphics
• Started Redistribution of Energy
PCA Redistribution of EnergyConvenient summary of amount of structure:
Total Sum of Squares
Physical Interpetation:Total Energy in Data
Insight comes from decomposition
Statistical Terminology:ANalysis Of VAriance (ANOVA)
n
iiX
1
2
PCA Redist’n of Energy (Cont.)
ANOVA mean decomposition:
Total Variation = = Mean Variation + Mean Residual Variation
Mathematics: Pythagorean Theorem
Intuition Quantified via Sums of Squares
n
ii
n
i
n
ii XXXX
1
2
1
2
1
2
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Residuals from Mean = Data – Mean
Most of Variation = 92% is Mean Variation SS
Remaining Variation = 8% is Resid. Var. SS
PCA Redist’n of Energy (Cont.)
Have already studied this decomposition (recall curve e.g.)• Variation due to Mean (% of total)• Variation of Mean Residuals (% of total)
PCA Redist’n of Energy (Cont.)
Now decompose SS about the mean
where:
Energy is expressed in trace of covar’ce matrix
XXtrnXXXXXX tn
ii
ti
n
ii
~~)1(11
2
ndn XXXX
nX
11
1~
ˆ1~~11
2trnXXtrnXX t
n
ii
PCA Redist’n of Energy (Cont.)
j
Eigenvalues provide atoms of SS decomposi’n
Useful Plots are:• “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs.
Note PCA gives SS’s for free (as eigenvalues),
but watch factors of
d
jj
ttn
ii DtrDBBtrBDBtrXX
n 11
2)(
1
1
j
j
j jlog
j
jj
1''
1n
PCA Redist’n of Energy (Cont.)
Note, have already considered some of these Useful Plots:• Power Spectrum• Cumulative Power Spectrum
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Object Space
Connect Math to Graphics (Cont.)
2-d Toy ExampleFeature Space Object Space
Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Object Space
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
(same, by Pythagorean Theorem)
3. “Best fit line” to data in “orthogonal sense”
(vs. regression of Y on X = vertical sense
& regression of X on Y = horizontal sense)
Use one that makes sense…
Different Views of PCA
Next Time:
Add some graphics about this
Scatterplot of Toy Data sets + various fits, with
residuals
Will be Useful in Stor 165, as well
Different Views of PCA2-d Toy Example
Feature Space Object Space
1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line
PCA Data RepresentationIdea: Expand Data Matrix
in terms of inner prod’ts & eigenvectors
Recall notation:
Eigenvalue expansion (centered data):
ndn XXXX
nX
11
1~
d
j
tjjnd XX
1
~~
n11d
PCA Data Represent’n (Cont.)
Now using:
Eigenvalue expansion (raw data):
Where:•Entries of are loadings•Entries of are scores
XnXX ~1
d
jjj
d
j
tjjnd cXXnXX
11
~1
n11dj
jc
PCA Data Represent’n (Cont.)
Can focus on individual data vectors:
(part of above full matrix rep’n)
Terminology: are called “PCs”
and are also called scores
d
jijji cXX
1
ijc
PCA Data Represent’n (Cont.)
More terminology:
• Scores, are coefficients in eigenvalue representation:
• Loadings are entries of eigenvectors:
d
jijji cXX
1
ijc
dj
j
j
1ij
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
•Reconstruct using only terms
(assuming decreasing eigenvalues)
•Gives: rank approximation of data
•Key to PCA dimension reduction
•And PCA for data compression (~ .jpeg)
k
jijji cXX
1
dk
k
PCA Data Represent’n (Cont.)
Choice of in Reduced Rank Represent’n:
•Generally very slippery problem
•SCREE plot (Kruskal 1964):
Find knee in power spectrum
k
PCA Data Represent’n (Cont.)
SCREE plot drawbacks:
•What is a knee?
•What if there are several?
•Knees depend on scaling (power? log?)
Personal suggestion:
•Find auxiliary cutoffs (inter-rater variation)
•Use the full range (ala scale space)
PCA SimulationIdea: given
•Mean Vector
•Eigenvectors
•Eigenvalues
Simulate data from corresponding Normal Distribution
Approach: Invert PCA Data Represent’n
where
k
jijjji ZX
1
1,0~ NZ ij
k
,1
k ,,1
Alternate PCA Computation
Issue: for HDLSS data (recall )
• may be quite large,
• Thus slow to work with, and to compute
• What about a shortcut?
Approach: Singular Value Decomposition
(of (centered, scaled) Data Matrix )
nd
dd
X~
Alternate PCA Computation
Singular Value Decomposition:
Where:
is unitary
is unitary
is diag’l matrix of singular val’s
Assume: decreasing singular values
V
U
tUSVX ~
S
dd
nn
01 kss
Alternate PCA Computation
Singular Value Decomposition:
Recall Relation to Eigen-analysis of
Thus have same eigenvector matrix
And eigenval’s are squares of singular val’s
USVX ~
ttttt UUSUSVUSVXXn 2))((~~ˆ1
nisii ,,1,2
U
Alternate PCA Computation
Singular Value Decomposition,
Computational advantage (for rank ):
Use compact form, only need to find
e-vec’s s-val’s scores
Other components not useful
So can be much faster for nd
USVX ~
nrrrrd VSU ,,
r
Alternate PCA Computation
Another Variation: Dual PCA
Motivation: Recall for demography data,
Useful to view as both
Rows as Data
&
Columns as Datanddnd
n
XX
XX
X
1
111
Alternate PCA Computation
Useful terminology (from optimization):
Primal PCA problem: Columns as
Data
Dual PCA problem: Rows as Data
nddnd
n
XX
XX
X
1
111
Alternate PCA Computation
Dual PCA Computation:
Same as above, but replace
with
So can almost replace
with
Then use SVD, , to get:
X
tXX~~ˆ
tUSVX ~
tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ
tX
XX tD
~~ˆ
Alternate PCA Computation
Appears to be cool symmetry:
Primal Dual
Loadings Scores
But, there is a problem with the
means…
VU
Alternate PCA Computation
Next time:
Explore Loadings & Scores issue
More deeply, with explicit look at
Notation…. VU
Primal - Dual PCANote different “mean vectors”:Primal Mean =Mean of Col. Vec’s:
Dual Mean = Mean of Row Vec’s:
nddnd
n
XX
XX
X
1
111
1d
n
jjnP XX
1
1
1n
d
iidD XX
1
1
Primal - Dual PCAPrimal PCA, based on SVD of Primal Data:
Dual PCA, based on SVD of Dual Data:
Very similar, except:• Different centerings• Different row – column interpretation
PP XXX ~
tDt
D XXX ~
Primal - Dual PCANext Time get factors of (n-1)
straight.Maybe best to dispense with that in
defn Of X_P and X_D…
Primal - Dual PCAToy Example 1:Random Curves, all in Primal Space:• * Constant Shift• * Linear • * Quadratic• Cubic
(chosen to be orthonormal)• Plus (small) i.i.d. Gaussian noise• d = 40, n = 20
2222
Primal - Dual PCAToy Example 1: Raw Data
Primal - Dual PCAToy Example 1: Raw Data• Primal (Col.) curves similar to before• Data mat’x asymmetric (but same
curves)• Dual (Row) curves much rougher
(showing Gaussian randomness)• How data were generated• Color map useful? (same as mesh
view)• See richer structure than before• Is it useful?
Primal - Dual PCAToy Example 1:
Primal PCAColumn Curves
as Data
Primal - Dual PCAToy Example 1: Primal PCA• Expected to recover increasing poly’s• But didn’t happen• Although can see the poly’s (order???)• Mean has quad’ic (since only n = 20???)• Scores (proj’ns) very random• Power Spectrum shows 4 components(not affected by subtracting Primal Mean)
Primal - Dual PCAToy Example 1:
Dual PCARow Curves
as Data
Primal - Dual PCAToy Example 1: Dual PCA• Curves all very wiggly (random noise)• Mean much bigger, 54% of Total Var!• Scores have strong smooth structure
(reflecting ordered primal e.v.’s)(recall primal e.v. dual scores)
• Power Spectrum shows 3 components(Driven by subtraction Dual Mean)
• Primal – Dual mean difference is critical