methods for sparse pca - stanford...

27
Outline Introduction Three Methods from the Literature Relationships Between these Methods Conclusions Methods for Sparse PCA May 4, 2012 Methods for Sparse PCA

Upload: others

Post on 16-Jul-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Methods for Sparse PCA

May 4, 2012

Methods for Sparse PCA

Page 2: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

IntroductionPrincipal Components Analysis

Three Methods from the LiteratureMaximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Relationships Between these MethodsEfficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Conclusions

Methods for Sparse PCA

Page 3: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Principal Components Analysis

Principal Components Analysis is a popular tool for exploratorydata analysis and dimension reduction in applied statistics.

Methods for Sparse PCA

Page 4: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Principal Components Analysis: Example

Methods for Sparse PCA

Page 5: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Principal Components Analysis: Example

Methods for Sparse PCA

Page 6: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Principal Components Analysis: Example

Methods for Sparse PCA

Page 7: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Notation

Let X be a n × p matrix with standardized columns, that is:∑pj=1 Xij = 0,

∑ni=1 X 2

ij = 1.

Methods for Sparse PCA

Page 8: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Three Ways to Arrive at First Principal Component

1. Maximal variance

2. Minimal reconstruction error

3. Best rank-1 approximation

Methods for Sparse PCA

Page 9: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Maximal Variance Approach

The first PC, v, is the direction of maximal variance:

v = argmaxvvTXTXv subject to ||v||2 = 1

Methods for Sparse PCA

Page 10: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Minimal Reconstruction Error Approach

The first PC, v, minimizes the reconstruction error:

(u, v) = argminu,v||X− XvuT ||2F subject to ||u||2 = ||v||2 = 1

Methods for Sparse PCA

Page 11: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Best Rank-1 Approximation Approach

The first PC, v, follows from the best rank-1 approximation:

(u, v, d) = argminu,v,d ||X− duvT ||2F subject to ||u||2 = ||v||2 = 1

Methods for Sparse PCA

Page 12: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Principal Components: Three Approaches

Methods for Sparse PCA

Page 13: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Sparse Principal Components Analysis

Suppose we want sparse principal components.

e.g. - Gene expression data - want to identify a sparse set of genesalong which most of the variation in the data is really taking place.

Methods for Sparse PCA

Page 14: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Example

From a study involving 569 elderly persons

An example of a mid-aggital brain slice, with the

corpus collosum annotated with landmarks.

Methods for Sparse PCA

Page 15: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Principal Components Analysis

Example- continued

Walking Speed

Verbal Fluency

Principal Components Sparse Principal Components

Standard and sparse principal components from a study of the corpus

callosum variation. The shape variations corresponding to significant

principal components (red curves) are overlaid on the mean CC shape

(black curves).

Methods for Sparse PCA

Page 16: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Three Ways to Arrive at First Sparse Principal Component

1. Maximal variance ... subject to L1 penalty

2. Minimal reconstruction error ... subject to L1 penalty

3. Best rank-1 approximation ... subject to L1 penalty

Methods for Sparse PCA

Page 17: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Maximal Variance Approach

v = argmaxvvTXTXv subject to ||v||2 = 1, ||v||1 ≤ c

Citation: “SCoTLASS” method of Jolliffe et al. (2003)

Methods for Sparse PCA

Page 18: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Maximal Variance Approach

1. Criterion follows naturally from maximal variance descriptionof principal components.

2. But, we are maximizing a convex function subject to apenalty... Not convex

Citation: Trendafilov and Jolliffe (2006)

Methods for Sparse PCA

Page 19: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Minimal Reconstruction Error Approach

(u, v) = argminu,v||X−XvuT ||2F+λ1||v||1+λ2||v||2 subject to ||u||2 = 1

Citation: “SPCA” method of Zou, Hastie, and Tibshirani (2006)

Iterative algorithm to solve for u and v.

Methods for Sparse PCA

Page 20: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Rank-1 Matrix Approximation

(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2

Citations: “Low rank matrix decomposition” of Shen and Huang (2008); “Penalizedmatrix decomposition” of Witten, Hastie, and Tibshirani (2008)

Fast iterative algorithm to solve for u and v using soft thresholding

Methods for Sparse PCA

Page 21: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Methods for Sparse PCA

Page 22: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Methods for Sparse PCA

Page 23: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Rank-1 Approximation leads to Maximal VarianceApproach

It is not hard to show that we can re-write the criterion for therank-1 approximation in a way that looks more like a variancecriterion:

(u, v) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2

= argmax uT Xv subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2

Methods for Sparse PCA

Page 24: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Rank-1 Approximation leads to Maximal VarianceApproach

Suppose we apply the Rank-1 approximation to X.

(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||v||1 ≤ c

Then, the solution v solves maximal variance criterion.So, rather than solving maximal variance criterion by maximizing aconvex function, we can use the quick iterative algorithm for thesparse rank-1 approximation.

Methods for Sparse PCA

Page 25: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Minimal Reconstruction Error as a Variance criterion

In a similar way, one can also show equivalence between minimalreconstruction error and maximal variance criterion, if we add anL1 constraint on u to the former.

Methods for Sparse PCA

Page 26: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Conclusions

1. There is no unique definition of sparse PCA: 3+ methodshave been proposed.

2. There exist previously unknown connections between these(seemingly different) methods; in fact, they are almostidentical!!

3. These connections have not only improved our understandingof each of the different methods, but have resulted in a newfast algorithm for a previously very difficult problem (MaximalVariance Criterion).

Methods for Sparse PCA

Page 27: Methods for Sparse PCA - Stanford Universitystatweb.stanford.edu/~tibs/sta306bfiles/sparsePCA.pdf · Introduction Three Methods from the Literature Relationships Between these Methods

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

References

1. Jolliffe, Trendafilov, and Uddin (2003) ’A modified principal componenttechnique based on the lasso’, Journal of Computational and GraphicalStatistics 12 531-547.

2. Trendafilov and Jolliffe (2006) ’Projected gradient approach to the numericalsolution of the SCoTLASS’, Computational Statistics and Data Analysis 50242-253.

3. Zou, Hastie, and Tibshirani (2006) ’Sparse principal component analysis’Journal of Computational and Graphical Statistics 15 262-286.

4. Shen and Huang (2008) ’Sparse principal component analysis via regularized lowrank matrix approximation’ Journal of Multivariate Analysis.

5. Witten, Hastie, and Tibshirani (2008) ’A penalized matrix decomposition, withapplications to canonical correlation analysis and principal components’,Submitted.

Methods for Sparse PCA