pca and admixture models - uclaweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf ·...

58
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57

Upload: others

Post on 07-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA and admixture modelsCM226: Machine Learning for Bioinformatics.

Fall 2016

Sriram SankararamanAcknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price

PCA and admixture models 1 / 57

Page 2: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Announcements

• HW1 solutions posted.

PCA and admixture models 2 / 57

Page 3: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Supervised versus Unsupervised Learning

Unsupervised Learning from unlabeled observations

• Dimensionality Reduction. Last class.

• Other latent variable models. This class + review of PCA.

PCA and admixture models 3 / 57

Page 4: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Outline

Dimensionality reduction

Linear Algebra background

PCAPractical issuesProbabilistic PCA

Admixture models

Population structure and GWAS

PCA and admixture models Dimensionality reduction 4 / 57

Page 5: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Raw data can be complex, high-dimensional

• If we knew what to measure, we could find simple relationships.

• Signals have redundancy.

• Genotype measured at ≈ 500K SNPs.

• Genotypes at neighboring SNPs correlated.

PCA and admixture models Dimensionality reduction 5 / 57

Page 6: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Dimensionality reduction

Goal: Find a “more compact” representation of dataWhy ?

• Visualize and discover hidden patterns.

• Preprocessing for a supervised learning problem.

• Statistical: remove noise.

• Computational: reduce wasteful computation.

PCA and admixture models Dimensionality reduction 6 / 57

Page 7: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Dimensionality reduction

Goal: Find a “more compact” representation of dataWhy ?

• Visualize and discover hidden patterns.

• Preprocessing for a supervised learning problem.

• Statistical: remove noise.

• Computational: reduce wasteful computation.

PCA and admixture models Dimensionality reduction 6 / 57

Page 8: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

An example

• We measure parents’ andoffspring heights.

• Two measurements.• Points in R2

• How can we find a more“compact” representation ?

• Two measurements arecorrelated with some noise.

• Pick a direction and project.

PCA and admixture models Dimensionality reduction 7 / 57

Page 9: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

An example

• We measure parents’ andoffspring heights.

• Two measurements.• Points in R2

• How can we find a more“compact” representation ?

• Two measurements arecorrelated with some noise.

• Pick a direction and project.

PCA and admixture models Dimensionality reduction 7 / 57

Page 10: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

An example

• We measure parents’ andoffspring heights.

• Two measurements.• Points in R2

• How can we find a more“compact” representation ?

• Two measurements arecorrelated with some noise.

• Pick a direction and project.

PCA and admixture models Dimensionality reduction 7 / 57

Page 11: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Goal: Minimize reconstruction error

• Find projection that minimizesthe Euclidean distance betweenoriginal points and projections.

• Principal Components Analysissolves this problem!

PCA and admixture models Dimensionality reduction 8 / 57

Page 12: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Principal Components Analysis

PCA: find lower dimensional representation of data

• Choose K.

• X is N ×M raw data.

• X ≈ ZWT where Z = N ×K reduced representaion (PC scores)

• W is M ×K principal components (columns are principalcomponents).

PCA and admixture models Dimensionality reduction 9 / 57

Page 13: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Outline

Dimensionality reduction

Linear Algebra background

PCAPractical issuesProbabilistic PCA

Admixture models

Population structure and GWAS

PCA and admixture models Linear Algebra background 10 / 57

Page 14: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Covariance matrix

C =1

NXTX

• Generalizes to many features

• Ci,i: variance of feature i

• Ci,j : covariance of feature i and j

• Symmetric

PCA and admixture models Linear Algebra background 11 / 57

Page 15: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Covariance matrix

C =1

NXTX

• Positive semi-definite (PSD). Sometimes indicated as C � 0

(Positive semi-definite matrix) A matrix A ∈ Rn×n is positivesemi-definite iff vTAv ≥ 0 for all v ∈ Rn.

PCA and admixture models Linear Algebra background 11 / 57

Page 16: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Covariance matrix

C =1

NXTX

• Positive semi-definite (PSD). Sometimes indicated as C � 0

vTCv ∝ vTXTXv

= (Xv)TXv

=

n∑i=1

(Xv)i2

PCA and admixture models Linear Algebra background 11 / 57

Page 17: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Covariance matrix

C =1

NXTX

• All covariance matrices (being symmetric and PSD) have aneigendecomposition

PCA and admixture models Linear Algebra background 11 / 57

Page 18: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigenvector and eigenvalue

(Eigenvector and eigenvalue) A vector v is an eigenvector ofA ∈ Rn×n if Av = λv for λ is the eigenvalue associated with v.

PCA and admixture models Linear Algebra background 12 / 57

Page 19: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition of a covariance matrix

• C is symmetric ⇒Its eigenvectors {ui}, i ∈ {1, . . . ,M} can be chosen to beorthonormal

• uTi uj = 0, i 6= j

• uTi ui = 1

• We can choose eigenvectors so that eigenvalues are in decreasingorder: λ1 ≥ λ2 . . . ≥ λM .

PCA and admixture models Linear Algebra background 13 / 57

Page 20: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition of a covariance matrix

Cui = λiui, i ∈ {1, . . . ,M}

Arrange U = [u1 . . .uM ]

CU = C[u1 . . .uM ]

= [Cu1 . . .CuM ]

= [λ1u1 . . . λMuM ]

= [u1 . . .uM ]

λ1 0 . . . 0...

......

...0 0 . . . λM

= UΛ

PCA and admixture models Linear Algebra background 13 / 57

Page 21: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition of a covariance matrix

CU = UΛ

Now U is an orthogonal matrix. So UUT = IM

C = CUUT

= UΛUT

PCA and admixture models Linear Algebra background 14 / 57

Page 22: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition of a covariance matrix

C = UΛUT

• U is m×m orthonormal matrix. Columns are eigenvectors sorted byeigenvalues.

• Λ is a diagonal matrix of eigenvalues.

PCA and admixture models Linear Algebra background 14 / 57

Page 23: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition: Example

Covariance matrix : Ψ

PCA and admixture models Linear Algebra background 15 / 57

Page 24: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Eigendecomposition: Example

Covariance matrix : Ψ

PCA and admixture models Linear Algebra background 15 / 57

Page 25: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Alternate characterization of eigenvectors

• Eigenvectors are orthonormal directions of maximum variance

• Eigenvalues are the variance in these directions.

• First eigenvector direction of maximum variance with variance = λ1.

PCA and admixture models Linear Algebra background 16 / 57

Page 26: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Alternate characterization of eigenvectors

Given covariance matrix C ∈ RM×M

x∗ = arg maxx xTCx

‖x‖2 = 1

Solution:x∗ = u1 is the first eigenvector of C.

• Example of a constrained optimization problem

• Why do we need the constaint ?

PCA and admixture models Linear Algebra background 16 / 57

Page 27: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Outline

Dimensionality reduction

Linear Algebra background

PCAPractical issuesProbabilistic PCA

Admixture models

Population structure and GWAS

PCA and admixture models PCA 17 / 57

Page 28: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Back to PCA

Given N data points xn ∈ RM , n ∈ {1, . . . , N}, find a lineartransformation from a lower dimensional space K < M :W ∈ RM×K and a projection zn ∈ RK so that we can reconstructoriginal data from the lower dimensional projection.

xn ≈ w1zn,1 + . . .+wKzn,K

= [w1 . . .wK ]

zn,1...zn,K

= Wzn, zn ∈ RK

• We assume the data is centered.∑

n xn,m = 0.

Compression• We go from storing N ×M to M ×K +N ×K.

How do we define quality of reconstruction?

PCA and admixture models PCA 18 / 57

Page 29: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA

• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error

J(W ,Z) =1

N

∑n

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

• Require columns of W to be orthonormal.

• The optimal solution is obtained by setting W = UK where UK

contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.

• The low-dimensional projection zn = WTxn.

PCA and admixture models PCA 19 / 57

Page 30: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA

• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error

J(W ,Z) =1

N

∑n

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

• Require columns of W to be orthonormal.

• The optimal solution is obtained by setting W = UK where UK

contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.

• The low-dimensional projection zn = WTxn.

PCA and admixture models PCA 19 / 57

Page 31: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA

• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error

J(W ,Z) =1

N

∑n

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

• Require columns of W to be orthonormal.

• The optimal solution is obtained by setting W = UK where UK

contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.

• The low-dimensional projection zn = WTxn.

PCA and admixture models PCA 19 / 57

Page 32: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1

J(w1, z1) =1

N

∑n

‖xn −w1zn,1‖22

=1

N

∑n

(xn −w1zn,1)T (xn −w1zn,1)

=1

N

∑n

(xTnx− 2wT

1 xnzn,1 + zn,12wT

1w1

)= const+

1

N

∑n

(−2wT

1 xnzn,1 + zn,12)

To maximize this function, take derivatives with respect to zn,1

∂J(w1, z1)

∂zn,1= 0

⇒ zn,1 = wT1 xn

PCA and admixture models PCA 20 / 57

Page 33: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1Plugging back zn,1 = wT

1 xn

J(w1) = const+1

N

∑n

(−2wT

1 xnzn,1 + zn,12)

= const+1

N

∑n

(−2zn,1zn,1 + zn,1

2)

= const− 1

N

∑n

zn,12

Now, because the data is centered

E [z1] =1

N

∑n

zn,1

=1

N

∑n

wT1 xn

= wT1

1

N

∑n

xn = 0PCA and admixture models PCA 20 / 57

Page 34: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1

J(w1) = const− 1

N

∑n

zn,12

Var [z1] = E[z1

2]− E [z1]

2

=1

N

∑n

zn,12 − 0

=1

N

∑n

zn,12

PCA and admixture models PCA 20 / 57

Page 35: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1

Putting together

J(w1) = const− 1

N

∑n

zn,12

Var [z1] =1

N

∑n

zn,12

We have

J(w1) = const− Var [z1]

Two views of PCA: Find a direction that minimizes the reconstructionerror ≡ Find a direction that maximizes variance of projected data

arg minw1J(w1) = arg maxw1

Var [z1]

PCA and admixture models PCA 20 / 57

Page 36: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1

arg minw1J(w1) = arg maxw1

Var [z1]

Var [z1] =1

N

∑n

zn,12

=1

N

∑n

wT1 xnw

T1 xn

=1

N

∑n

wT1 xnx

Tnw1

= wT1

∑n(xnx

Tn )

Nw1

= wT1Cw1

PCA and admixture models PCA 21 / 57

Page 37: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K = 1

arg minw1J(w1) = arg maxw1

Var [z1]

So we need to solve

arg maxw1wT

1Cw1

Since we required W to be orthonormal, we need to constrain: ‖w1‖2 = 1.

This objective function is maximized when w1 is the first eigenvector of C

PCA and admixture models PCA 21 / 57

Page 38: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: K > 1

• We can repeat the argument for K > 1.

• Since we require directions wk to be orthonormal, we can repeat theargument by searching for direction that maximzes the remainingvariance and is orthogonal to previously selected directions.

PCA and admixture models PCA 22 / 57

Page 39: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Computing eigendecompositions

• Numerical algorithms to compute all eigenvalue, eigenvectors.O(M3).

• Infeasible for genetic datasets.

• Computing largest eigenvalue, eigenvector: Power iteration. O(M2).

• Since we are interested in covariance matrices, can use algorithms tocompute the singular-value decomposition (SVD): O(MN2). (Willdiscuss later).

PCA and admixture models PCA 23 / 57

Page 40: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Practical issues

Choosing K

• For visualization, K = 2 or K = 3.

• For other analyses, pick K so that most of the variance in the data isretained. Fraction of variance retained in the top K eigenvectors∑K

k=1 λk∑Mm=1 λm

PCA and admixture models PCA 24 / 57

Page 41: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: Example

PCA and admixture models PCA 25 / 57

Page 42: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: Example

PCA and admixture models PCA 25 / 57

Page 43: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: Example

PCA and admixture models PCA 25 / 57

Page 44: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: Example

PCA and admixture models PCA 25 / 57

Page 45: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA: Example

PCA and admixture models PCA 25 / 57

Page 46: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA on HapMap

PCA and admixture models PCA 26 / 57

Page 47: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA on Human Genome Diversity Project

PCA and admixture models PCA 27 / 57

Page 48: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA on Human Genome Diversity Project

PCA and admixture models PCA 27 / 57

Page 49: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

PCA on European genetic data

1

Novembre et al. Nature 2008PCA and admixture models PCA 28 / 57

Page 50: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic interpretation of PCA

zniid∼ N (0, IK)

p(xn|zn) = N (Wzn, σ2IM )

PCA and admixture models PCA 29 / 57

Page 51: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic interpretation of PCA

zniid∼ N (0, IK)

p(xn|zn) = N (Wzn, σ2IM )

E [xn|zn] = Wzn

E [xn] = E [E [xn|zn]]

= E [Wzn]

= WE [zn]

= 0

PCA and admixture models PCA 29 / 57

Page 52: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic interpretation of PCA

zniid∼ N (0, IK)

p(xn|zn) = N (Wzn, σ2IM )

Cov [xn] = E[xnx

Tn

]− E [xn]E [xn]T

= E[(Wzn + εn)(Wzn + εn)T

]− 0

= E[Wznz

TnW

T + 2WznεTn + εnε

Tn

]= E

[Wznz

TnW

T]

+ E[2Wznε

Tn

]+ E

[εnε

Tn

]= WE [znzn]WT + 2WE

[znε

Tn

]+ σ2IM

= WE [znzn]WT + 2WE [zn]E [εn]T + σ2IM

= WIKWT + 2W 0 + σ2IM

= WWT + σ2IM

PCA and admixture models PCA 29 / 57

Page 53: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic PCA

Log likelihood

LL(W , σ2) ≡ logP (D|W , σ2)

Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator

WML = UK

√(ΛK − σ2IK)

UK = [U1 . . .UK ]

ΛK =

λ1 . . . 0...

...0 . . . λK

σ2ML =

1

M −K

M∑j=K+1

λj

PCA and admixture models PCA 30 / 57

Page 54: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic PCA

Log likelihood

LL(W , σ2) ≡ logP (D|W , σ2)

Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator

WML = UK

√(ΛK − σ2IK)

UK = [U1 . . .UK ]

ΛK =

λ1 . . . 0...

...0 . . . λK

σ2ML =

1

M −K

M∑j=K+1

λj

PCA and admixture models PCA 30 / 57

Page 55: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic PCA

Computing the MLE

• Compute eigenvalues, eigenvectors

• Hidden/latent variable problem: Use EM

PCA and admixture models PCA 31 / 57

Page 56: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Probabilistic PCA

Computing the MLE

• Compute eigenvalues, eigenvectors

• Hidden/latent variable problem: Use EM

PCA and admixture models PCA 31 / 57

Page 57: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Other advantages of Probabilistic PCA

Can use model selection to infer K.

• Choose K to maximize the marginal likelihood P (D|K).

• Use cross-validation and pick K that maximizes likelihood on held outdata.

• Other model selection criteria such as AIC or BIC (see lecture 6 onclustering).

PCA and admixture models PCA 32 / 57

Page 58: PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf · PCA and admixture models Dimensionality reduction 4 / 57 Raw data can be complex,

Mini-Summary

• Dimensionality reduction: Linear methods• Exploratory analysis and visualization.• Downstream inference: Can use the low-dimensional features for other

tasks.

• Principal Components Analysis finds a linear subspace that minimizedreconstruction error or equivalently maximizes the variance.

• Eigenvalue problem.• Probabilistic interpretation also leads to EM.

• Why may PCA not be appropriate for genetic data ?

PCA and admixture models PCA 33 / 57