object orie’d data analysis, last time finished algebra review multivariate probability review pca...

Object Orie’d Data Analysis, Last Time

• Finished Algebra Review

• Multivariate Probability Review

• PCA as an Optimization Problem

(Eigen-decomp. gives rotation, easy sol’n)

• Connected Mathematics & Graphics

• Started Redistribution of Energy

PCA Redistribution of EnergyConvenient summary of amount of structure:

Total Sum of Squares

Physical Interpetation:Total Energy in Data

Insight comes from decomposition

Statistical Terminology:ANalysis Of VAriance (ANOVA)

n

iiX

1

2

PCA Redist’n of Energy (Cont.)

ANOVA mean decomposition:

Total Variation = = Mean Variation + Mean Residual Variation

Mathematics: Pythagorean Theorem

Intuition Quantified via Sums of Squares

n

ii

n

i

n

ii XXXX

1

2

1

2

1

2

Connect Math to Graphics (Cont.)

2-d Toy ExampleFeature Space Object Space

Residuals from Mean = Data – Mean

Most of Variation = 92% is Mean Variation SS

Remaining Variation = 8% is Resid. Var. SS


Have already studied this decomposition (recall curve e.g.)• Variation due to Mean (% of total)• Variation of Mean Residuals (% of total)


Now decompose SS about the mean

where:

Energy is expressed in trace of covar’ce matrix

XXtrnXXXXXX tn

ii

ti

n

ii

~~)1(11

2

ndn XXXX

nX

11

1~

ˆ1~~11

2trnXXtrnXX t

n

ii


j

Eigenvalues provide atoms of SS decomposi’n

Useful Plots are:• “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs.

Note PCA gives SS’s for free (as eigenvalues),

but watch factors of

d

jj

ttn

ii DtrDBBtrBDBtrXX

n 11

2)(

1

1

j

j

j jlog

j

jj

1''

1n


Note, have already considered some of these Useful Plots:• Power Spectrum• Cumulative Power Spectrum



Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Object Space



Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Object Space

Different Views of PCA

Solves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

(same, by Pythagorean Theorem)

3. “Best fit line” to data in “orthogonal sense”

(vs. regression of Y on X = vertical sense

& regression of X on Y = horizontal sense)

Use one that makes sense…

Different Views of PCA

Next Time:

Add some graphics about this

Scatterplot of Toy Data sets + various fits, with

residuals

Will be Useful in Stor 165, as well

Different Views of PCA2-d Toy Example

Feature Space Object Space

1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line

PCA Data RepresentationIdea: Expand Data Matrix

in terms of inner prod’ts & eigenvectors

Recall notation:

Eigenvalue expansion (centered data):

ndn XXXX

nX

11

1~

d

j

tjjnd XX

1

~~

n11d

PCA Data Represent’n (Cont.)

Now using:

Eigenvalue expansion (raw data):

Where:•Entries of are loadings•Entries of are scores

XnXX ~1

d

jjj

d

j

tjjnd cXXnXX

11

~1

n11dj

jc


Can focus on individual data vectors:

(part of above full matrix rep’n)

Terminology: are called “PCs”

and are also called scores

d

jijji cXX

1

ijc


More terminology:

• Scores, are coefficients in eigenvalue representation:

• Loadings are entries of eigenvectors:

d

jijji cXX

1

ijc

dj

j

j

1ij


Reduced Rank Representation:

•Reconstruct using only terms

(assuming decreasing eigenvalues)

•Gives: rank approximation of data

•Key to PCA dimension reduction

•And PCA for data compression (~ .jpeg)

k

jijji cXX

1

dk

k


Choice of in Reduced Rank Represent’n:

•Generally very slippery problem

•SCREE plot (Kruskal 1964):

Find knee in power spectrum

k


SCREE plot drawbacks:

•What is a knee?

•What if there are several?

•Knees depend on scaling (power? log?)

Personal suggestion:

•Find auxiliary cutoffs (inter-rater variation)

•Use the full range (ala scale space)

PCA SimulationIdea: given

•Mean Vector

•Eigenvectors

•Eigenvalues

Simulate data from corresponding Normal Distribution

Approach: Invert PCA Data Represent’n

where

k

jijjji ZX

1

1,0~ NZ ij

k

,1

k ,,1

Alternate PCA Computation

Issue: for HDLSS data (recall )

• may be quite large,

• Thus slow to work with, and to compute

• What about a shortcut?

Approach: Singular Value Decomposition

(of (centered, scaled) Data Matrix )

nd

dd

X~


Singular Value Decomposition:

Where:

is unitary

is unitary

is diag’l matrix of singular val’s

Assume: decreasing singular values

V

U

tUSVX ~

S

dd

nn

01 kss


Singular Value Decomposition:

Recall Relation to Eigen-analysis of

Thus have same eigenvector matrix

And eigenval’s are squares of singular val’s

USVX ~

ttttt UUSUSVUSVXXn 2))((~~ˆ1

nisii ,,1,2

U


Singular Value Decomposition,

Computational advantage (for rank ):

Use compact form, only need to find

e-vec’s s-val’s scores

Other components not useful

So can be much faster for nd

USVX ~

nrrrrd VSU ,,

r


Another Variation: Dual PCA

Motivation: Recall for demography data,

Useful to view as both

Rows as Data

&

Columns as Datanddnd

n

XX

XX

X

1

111


Useful terminology (from optimization):

Primal PCA problem: Columns as

Data

Dual PCA problem: Rows as Data

nddnd

n

XX

XX

X

1

111


Dual PCA Computation:

Same as above, but replace

with

So can almost replace

with

Then use SVD, , to get:

X

tXX~~ˆ

tUSVX ~

tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ

tX

XX tD

~~ˆ


Appears to be cool symmetry:

Primal Dual

Loadings Scores

But, there is a problem with the

means…

VU


Next time:

Explore Loadings & Scores issue

More deeply, with explicit look at

Notation…. VU

Primal - Dual PCANote different “mean vectors”:Primal Mean =Mean of Col. Vec’s:

Dual Mean = Mean of Row Vec’s:

nddnd

n

XX

XX

X

1

111

1d

n

jjnP XX

1

1

1n

d

iidD XX

1

1

Primal - Dual PCAPrimal PCA, based on SVD of Primal Data:

Dual PCA, based on SVD of Dual Data:

Very similar, except:• Different centerings• Different row – column interpretation

PP XXX ~

tDt

D XXX ~

Primal - Dual PCANext Time get factors of (n-1)

straight.Maybe best to dispense with that in

defn Of X_P and X_D…

Primal - Dual PCAToy Example 1:Random Curves, all in Primal Space:• * Constant Shift• * Linear • * Quadratic• Cubic

(chosen to be orthonormal)• Plus (small) i.i.d. Gaussian noise• d = 40, n = 20

2222

Primal - Dual PCAToy Example 1: Raw Data

Primal - Dual PCAToy Example 1: Raw Data• Primal (Col.) curves similar to before• Data mat’x asymmetric (but same

curves)• Dual (Row) curves much rougher

(showing Gaussian randomness)• How data were generated• Color map useful? (same as mesh

view)• See richer structure than before• Is it useful?

Primal - Dual PCAToy Example 1:

Primal PCAColumn Curves

as Data

Primal - Dual PCAToy Example 1: Primal PCA• Expected to recover increasing poly’s• But didn’t happen• Although can see the poly’s (order???)• Mean has quad’ic (since only n = 20???)• Scores (proj’ns) very random• Power Spectrum shows 4 components(not affected by subtracting Primal Mean)

Primal - Dual PCAToy Example 1:

Dual PCARow Curves

as Data

Primal - Dual PCAToy Example 1: Dual PCA• Curves all very wiggly (random noise)• Mean much bigger, 54% of Total Var!• Scores have strong smooth structure

(reflecting ordered primal e.v.’s)(recall primal e.v. dual scores)

• Power Spectrum shows 3 components(Driven by subtraction Dual Mean)

• Primal – Dual mean difference is critical

object orie’d data analysis, last time finished algebra review multivariate probability review pca...

Documents

data compression

ss pca redistn of energy

data meanmost of variation

thisscatterplot of toy

scorespca data representn

jpegpca data representn

object oried data analysis

note pca