principal component analysis adapted by paul anderson from tutorial by doug raiford

39
Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Upload: aldous-owen

Post on 26-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Principal Component Analysis

Adapted by Paul Anderson from Tutorial by Doug Raiford

Page 2: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

The Problem with Apples and Oranges

High dimensionality Can’t “see” If had only one, two, or

three features, could represent graphically

But 4 or more…

  Weight Diameter Redness Orangeness

Ex1 0.26 3.10 2.92 7.78

Ex2 0.35 2.51 1.91 5.34

Ex3 0.30 2.33 2.05 11.49

Ex4 0.21 3.67 10.82 1.79

Ex5 0.28 2.13 3.11 9.02

Ex6 0.28 3.83 8.80 2.04

Ex7 0.10 3.96 7.30 2.81

Ex8 0.32 3.40 1.16 12.01

Ex9 0.19 3.89 2.75 9.45

Ex10 0.22 2.46 1.71 10.98

Ex11 0.33 3.95 7.88 2.67

Ex12 0.43 2.99 1.03 10.16

Ex13 0.21 5.29 11.44 1.44

Ex14 0.30 3.35 9.99 1.51

Ex15 0.26 3.04 4.48 1.46

Ex16 0.27 4.38 6.48 1.55

Ex17 0.46 2.90 1.86 7.79

Ex18 0.29 2.92 11.88 1.66

Ex19 0.24 3.50 9.09 1.75

Ex20 0.40 3.24 2.00 11.16

Page 3: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

If Could Compress Into 2 Dimensions

Apples and oranges: feature vectors

Axis of greatest variance

Page 4: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

How?

In MatLab– evects = princomp(allFruit);

b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);

Page 5: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Real World Example

59 dimensions 3500 genes Very useful in

exploratory data analysis

Sometimes useful as a direct tool (MCU)

Page 6: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

But We’re Not Scared of the Details

Given– Data matrix M (feature vectors for all examples)

Generate – covariance matrix for M (Σ)– Eigenvectors (principal components) from

covariance matrix

M Σ Eigenvectors

Page 7: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvectors and Eigenvalues

Each Eigenvector is accompanied with an Eigenvalue

The Eigenvector with the greatest Eigenvalue points along the axis of greatest variance

Page 8: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvectors and Eigenvalues

If use only first principal component very little degradation of data

Have reduced dimensions from 2 to 1

Page 9: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Project data onto new axes

Once have Eigenvectors can project data onto new axis

Eigenvectors are unit vectors, so simple dot product produces the desired effect

M Σ Eigenvectors Project Data

Page 10: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Covariance Matrix

M Σ Eigenvectors Project DataWeight Diameter Redness Orangeness

Ex1 0.26 3.10 2.92 7.78Ex2 0.35 2.51 1.91 5.34Ex3 0.30 2.33 2.05 11.49Ex4 0.21 3.67 10.82 1.79Ex5 0.28 2.13 3.11 9.02Ex6 0.28 3.83 8.80 2.04Ex7 0.10 3.96 7.30 2.81Ex8 0.32 3.40 1.16 12.01Ex9 0.19 3.89 2.75 9.45Ex10 0.22 2.46 1.71 10.98Ex11 0.33 3.95 7.88 2.67Ex12 0.43 2.99 1.03 10.16Ex13 0.21 5.29 11.44 1.44Ex14 0.30 3.35 9.99 1.51Ex15 0.26 3.04 4.48 1.46Ex16 0.27 4.38 6.48 1.55Ex17 0.46 2.90 1.86 7.79Ex18 0.29 2.92 11.88 1.66Ex19 0.24 3.50 9.09 1.75Ex20 0.40 3.24 2.00 11.16

1,1 1,2 1,3 1,4

2,1 2,2 2,3 2,4

3,1 3,2 3,3 3,4

4,1 4,2 4,3 4,4

Page 11: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Covariance Matrix

8.3949 7.5958

7.5958 7.7130

Page 12: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Covariance Matrix

8.7951 0.3299

0.3299 0.9200

Page 13: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvector

Eigenvector– Linear transform of the Eigenvector using Σ as the

transformation matrix resulting in a parallel vector

M Σ Eigenvectors Project Data

vΣv

Page 14: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvector

How to find– Σ is an nxn matrix– There will be n

Eigenvectors– Eigenvectors ≠ 0– Eigenvalues ≠ 0

0)(

0

0

vΣv

vΣv

vΣv

I

I

Page 15: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvector

A is invertible if and only if det(A) 0 If (A-v) is invertible then:

But it is given that v 0 so must not be invertible

Not invertible so det(A-v) = 0

0v

0Σv

0vΣ

1)(

)(

I

I

Page 16: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvector

First, solve for the by performing the following operations:

If solve for will get 2 roots, 1 and 2.

0)()()P(

0)det(

0

0

2

2

bcadda

bcdaadI

dc

ba

dc

baI

Σ

Σ

Page 17: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvector

Now that the Eigenvalues have been acquired we can solve for the Eigenvector (v below).

Know Σ, know , know I, so becomes homogeneous system of equations (equal to 0) with the entries of v as the variables

Already know that there is no unique solution – The only way there is a unique solution is if the trivial solution is only

solution. – If this were the case it would be invertible

0vΣ )( I

Page 18: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Back to the example

69.

79.

2625.75958.7

2625.7

5958.7

09443.75958.7

05958.72625.7

15.66,0.45,

0607*607717*398717398

0)det(

)cov(

22

1

1

1_1

1

1

21

2

2

rEivenvecto

rEigenvecto

rEivenvectorEigenvecto

rEigenvecto

yx

yx

I

roots

)=..-..().+.(

(ad-bc)=(a+d)dc

baI

data

norm

7.94-7.60

7.607.26-Σ

Σ

7.717.60

7.608.39Σ

Page 19: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Back to the example

0.72

0.69

7.9443

7.5958

7.267.60

7.607.94

normrEigenvecto

rEigenvecto

y

xy

y.x.

y.x.

I(A

2

2

5958.7

9443.7

9443.75958.7

02625759587

05958794437

)2

Page 20: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Eigenvectors (Summary)

Find characteristic polynomial using determinant

Solve for Eigenvalues (λ’s) Solve for Eigenvectors

M Σ Eigenvectors Project Data

P(λ) λ’s Eigenvectors

Page 21: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Axis of Greatest Variance?

Equation for an ellipse D, E, and F have to do

with translation A and C related to the

ellipse’s spread along the X and Y axes, respectively

B has to do with rotation

022 F Ey Dx Cy Bxy Ax

022 CyBxyAx

Page 22: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Axis of Greatest Variance

Mathematicians discovered that any ellipse can be exactly captured by a symmetric matrix

Covariance matrix is symmetric

The Eigenvectors of the said matrix point along the principal axes of the ellipse

Origin of the name (principal components analysis)

A B/2

B/2 CRelated to spread along x axis (variance of data along x axis)

Related to spread along y axis

Related to rotation (covariance)

Page 23: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Principal Axis Theorem

Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces

Page 24: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Project Data Onto Principal Components

Eigenvectors are unit vectors

vv

vuu 2

vproj

M Σ Eigenvectors Project Data

2

1

Mv

Mv

Page 25: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Review

In MatLab

evects = princomp(allFruit);b1=evects(:,1);b2=evects(:,2);Z1=allFruit*b1;Z2=allFruit*b2;scatter(Z1,Z2);

Page 26: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Practice

Covariance matrix 4.3703 2.0668

2.0668 4.0295

4 2

2 4

4.3703 2.0668

2.0668 4.0295

Page 27: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Practice

0224444

0

0)det(

2

2

=*-*λ + + - λ

=ad-bcλ + a+d-λ

I

2

6

2

1

λ

λ

M Σ Eigenvectors Project Data

P(λ) λ’s Eigenvectors

4 2

2 4

Page 28: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Practice

2-2

22-1IΣ

0.7071

0.7071

1

1

22

022

1

1_1

1

rEigenvecto

rEivenvectorEigenvecto

rEigenvecto

xy

xy

yx

norm

4 2

2 4

M Σ Eigenvectors Project Data

P(λ) λ’s Eigenvectors

2

6

2

1

λ

λ

Page 29: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Practice

M Σ Eigenvectors Project Data

P(λ) λ’s Eigenvectors

0.7071

0.7071-

1

1-

1

22

022

22

22

2

2_2

2

2

rEigenvecto

rEivenvectorEigenvecto

rEigenvecto

y

xy

yx

I

norm

Σ

4 2

2 4

Page 30: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Questions?

Page 31: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Why Invertible if row reducible to I?

zy

xw

dc

ba

dc

ba

dc

ba

dzcxdycw

bzaxbyaw

zy

xw

dc

ba

IAA

10

01

10

01

1

0,

0

1

10

01

10

01

1

Page 32: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Implication of Zero Determinants

Page 33: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Why Eigenvector Associated with Greatest λ Points Along Axis of Greatest Variance

)(

),(

)1)((

)1)(,(

ˆ1

)(

1),(

)(

)(

))((

*

tcoefficienn correlatioPearson

1

10

1

2

1

2

1

XVar

YXCov

nXVar

nYXCov

SSX

SCPb

xbbyn

SSXXVar

n

SCPYXCov

yySSY

xxSSX

yyxxSCP

SSYSSX

SCPR

n

ii

n

ii

n

iii

xy

xy

xca

dby

caxdby

xcaxyydb

yxydbxca

yxydbxca

y

x

dycx

byax

y

x

y

x

dc

ba

x

y

x

y

cov)(var

)(varcov

covvar

varcov

)()(

)()(

)()(

00

)()(

Page 34: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Rotation

Good search terms: rotation of axes conic sections Note that in the sections above dealing with the ellipse, hyperbola, and the parabola, the algebraic

equations that appeared did not contain a term of the form xy. However, in our “Algebraic View of the Conic Sections,” we stated that every conic section is of the form

Ax2 + Bxy + Cy2 + Dx + Ey + F = 0

where A, B, C, D, E, and F are constants. In essence, all of the equations that we have studied have had B=0. So the question arises: “What role, if any, does the xy term play in conic sections? If it were present, how would that change the geometric figure?”

First of all, the answer is NOT that the conic changes from one type to another. That is to say, if we introduce an xy term, the conic does NOT change from an ellipse to a hyperbola. If we start with the standard equation of an ellipse and insert an extra term, an xy term, we still have an ellipse.

So what does the xy term do? The xy term actually rotates the graph in the plane. For example, in the case of an ellipse, the major axis is no longer parallel to the x-axis or y-axis. Rather, depending on the constant in front of the xy term, we now have the major axis rotated. Let’s look at an example.

* Example

Page 35: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

rotation

A is related to elongation in x direction C is related to elongation in y direction B is related to rotation (B is not equal to zero

if and only if there is rotation) D, E, and F related to h and k (x and y shift,

(x-h), (y-k)) D, E, and F not affected by rotation A and C are affected

Page 36: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Standard equation

Standard equation of the ellipse is: a = 5 and b = 2 Hence: The length of major axes is: 2a = 10. The length of minor axes is: 2b = 4.

1)()(

2

2

2

2

b

ky

a

hx

Page 37: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

New rotated coordinate system

Coordinate Rotation Formulas– If a rectangular xy-coordinate system is rotated

through an angle θ to form an x’y’ coordinate system, then a point P(x; y) will have coordinates P(x’; y’) in the new system, where (x; y) and (x’; y’) are related by

x = x’ cos θ − y' sin θ and y = x' sin θ + y' cos θ : and x' = x cos θ + y sin θ and y' = −x sin θ + y cos θ :

Page 38: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

rotation

The values of h and k give horizontal and vertical (resp.) translation distances, and t gives rotation angle (measured in degrees). Notice how changes in these transformation values affect the coefficients, and how changes in the coefficients affect the transformations.

The lines shown in green in the graph are the following key lines for the conic sections: the major and minor axes for ellipses (crossing at the center of the ellipse), the axis of symmetry and perpendicular line through the vertex for a parabola (crossing at the vertex), and the two perpendicular axes of symmetry (crossing through the center point) for a hyperbola. In all cases, the two lines cross at the point (h,k), and are rotated from the position parallel to the coordinate axes by t degrees. In graphs of hyperbolas, the asymptotes of the hyperbola are shown as orange lines.

If B2-4AC<0, then the graph is an ellipse (if B=0 and A=C in this case, then the graph is a circle)

One other important formula determines the relationship between the coefficients and the angle of rotation: tan(2t)=B/(A-C). Note that rotation has no effect on the values of the coefficients D, E, and F, and that t=0 (no rotation) if and only if B=0. The values of the coordinates of the point (h,k) are best determined from the coefficients by first reversing the effect of the rotation (so that B=0), then completing the squares.

Page 39: Principal Component Analysis Adapted by Paul Anderson from Tutorial by Doug Raiford

Principal Axis Theorem

Principal axis theorem holds for quadratic forms (conic sections) in higher dimensional spaces

)var()var(

),cov(

)var()var(

),cov(22

)var()var(

),cov(2

)2cos(

)2sin(

2tan

yx

yxslope

yx

yxslope

yx

yx

Θ

Θ

(A-C)

BΘ)(