generalized principal component analysis · 2015-11-11 · generalized principal component...

28
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Andrew Landgraf November 10, 2015 Department of Statistics and Probability Michigan State University

Upload: others

Post on 20-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Generalized Principal Component Analysis:Dimensionality Reduction through

the Projection of Natural Parameters

Yoonkyung Lee*Department of Statistics

The Ohio State University*joint work with Andrew Landgraf

November 10, 2015Department of Statistics and Probability

Michigan State University

Page 2: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Dimensionality Reduction

Principal component analysis (PCA)to generalized PCA for non-Gaussian data

Hotelling, H. (1933), Analysis of a complex of statisticalvariables into principal componentsJournal of Educational Psychology 24(6), 417-441

Pearson, K. (1901), On Lines and Planes of Closest Fit toSystems of Points in SpacePhilosophical Magazine 2(11), 559-572

Page 3: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Principal Component Analysis (PCA)

PCA is concerned with explaining the variance-covariancestructure of a set of correlated variables through a few linearcombinations of these variables.

● ●

● ●

●●

1.2 1.4 1.6 1.8 2.0 2.2 2.4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

Humerus

D.H

umer

us

● ●

● ●

●●

1.2 1.4 1.6 1.8 2.0 2.2 2.4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

random projection

Humerus

D.H

umer

us

● ●

●●

●●

●●●

● ●

●● ●

● ●

● ●

● ●

●●

1.2 1.4 1.6 1.8 2.0 2.2 2.4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

PC1

Humerus

D.H

umer

us

●●

●●

●●●

Figure: Data on the mineral content measurements (g/cm) of threebones (humerus, radius and ulna) on the dominant and nondominantsides for 25 old women

Page 4: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Variance Maximization

I Given p correlated variables X = (X1, · · · ,Xp)>, consider alinear combination of Xj ’s,

p∑j=1

ajXj = a>X

for a = (a1, . . . ,ap)> ∈ Rp with ‖a‖2 = 1.

I The first principal component direction is defined as thevector a that gives the largest sample variance of a>Xamong all unit vectors a:

maxa∈Rp,‖a‖2=1

a>Sna

where Sn is the sample variance-covariance matrix of X .

Page 5: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Principal Components

I Let Sn =∑p

j=1 λjvjv>j with eigenvaluesλ1 ≥ λ2 ≥ · · · ≥ λp > 0, and the correspondingeigenvectors v1, . . . , vp.Then the first principal component direction is given by v1.

I The derived variable Z1 = v>1 X is called the first principalcomponent.

I Similarly, the second principal component direction isdefined as the vector a with the largest sample variance ofa>X among all normalized a subject to a>X beinguncorrelated with v>1 X . It is given by v2.

I In general, the j th principal component direction is definedsuccessively from j = 1 to p.

Page 6: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Pearson’s Reconstruction Error FormulationPearson, K. (1901), On Lines and Planes of Closest Fit toSystems of Points in Space

I Given x1, · · · , xn ∈ Rp, consider the following dataapproximation:

xi ≈ µ+ vv>(xi − µ)

where µ ∈ Rp and v is a unit vector in Rp so that vv> is arank-one projection.

I What are µ and v ∈ Rp (with ‖v‖2 = 1) that minimize thereconstruction error?

n∑i=1

‖xi − µ− vv>(xi − µ)‖2

I µ̂ = x̄ and v̂ = v1 minimize the error.

Page 7: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Minimization of Reconstruction Error

I More generally, consider a rank-k (< p) approximation:

xi ≈ µ+ VV>(xi − µ)

where µ ∈ Rp and V is a p × k matrix with orthogonalcolumns that results in a rank-k projection of VV>.

I Wish to minimize the reconstruction error:

n∑i=1

‖xi − µ− VV>(xi − µ)‖2

subject to V>V = Ik

I µ̂ = x̄ and V̂ = [v1, · · · , vk ] provide the best k -dimensionalreconstruction of the data.

Page 8: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

PCA for Non-Gaussian Data?

I PCA finds a low rank subspace by implicitly minimizing thereconstruction error under squared error loss, which islinked to Gaussian distribution.

I Binary, count, or non-negative data abound in practice.

e.g. images, term frequencies for documents, ratings formovies, click-through rates for online ads

I How to generalize PCA to non-Gaussian data?

Page 9: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Generalization of PCACollins et al. (2001), A generalization of principal componentsanalysis to the exponential family

I Draws on the ideas from the exponential family andgeneralized linear models.

I For Gaussian data, assume that xi ∼ Np(θi , Ip) and θi ∈ Rp

lies in a k dimensional subspace:

for a basis {b`}k`=1, θi =k∑

`=1

ai`b` = B(p×k)ai

I To find Θ = [θij ], maximize the log likelihood or equivalentlyminimize the negative log likelihood (or deviance):

n∑i=1

‖xi − θi‖2 = ‖X −Θ‖2F = ‖X − AB>‖2F

Page 10: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Generalization of PCAI According to Eckart-Young theorem, the best rank-k

approximation of X (= Un×pDp×pV>p×p) is given by therank-k truncated singular value decomposition UkDk︸ ︷︷ ︸

A

V>k︸︷︷︸B>

.

I For exponential family data, factorize the matrix of naturalparameter values Θ as AB> with rank-k matrices An×k andBp×k (of orthogonal columns) by maximizing the loglikelihood.

I For binary data X = [xij ] with P = [pij ], “logistic PCA” looks

for a factorization of Θ =[log pij

1−pij

]= AB> that maximizes

`(X ; Θ) =∑i,j

{xij(a>i bj∗)− log(1 + exp(a>i bj∗))

}

subject to B>B = Ik .

Page 11: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Drawbacks of the Matrix Factorization Formulation

I Involves estimation of both case-specific (or row-specific)scores A and variable-specific (or column-specific) factorsB: more of extension of SVD than PCA.

I The number of parameters increases with the number ofobservations.

I The scores of generalized PC for new data involveadditional optimization while PC scores for standard PCAare simple linear combinations of the data.

Page 12: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Alternative Interpretation of Standard PCA

I Assuming that data are centered (µ = 0), minimize

n∑i=1

‖xi − VV>xi‖2 = ‖X − XVV>‖2F

subject to V>V = Ik .

I XVV> can be viewed as a rank-k projection of the matrixof natural parameters (“means” in this case) of thesaturated model Θ̃ (best possible fit) for Gaussian data.

I Standard PCA finds the best rank-k projection of Θ̃ byminimizing the deviance under Gaussian distribution.

Page 13: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Natural Parameters of the Saturated Model

I For an exponential family distribution with naturalparameter θ and pdf

f (x |θ) = exp (θx − b(θ) + c(x)) ,

E(X ) = b′(θ) and the canonical link function is the inverseof b′.

θ b(θ) canonical linkN(µ,1) µ θ2/2 identityBernoulli(p) logit(p) log(1 + exp(θ)) logitPoisson(λ) log(λ) exp(θ) log

I Take Θ̃ = [canonical link(xij)].

Page 14: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

New Formulation of Logistic PCA

Landgraf and Lee (2015), Dimensionality Reduction for BinaryData through the Projection of Natural Parameters

I Given xij ∼ Bernoulli(pij), the natural parameter (logit pij )of the saturated model is

θ̃ij = logit(xij) =∞× (2xij − 1)

We will approximate θ̃ij ≈ m × (2xij − 1) for large m > 0.

I Project Θ̃ to a k -dimensional subspace by using thedeviance D(X ; Θ) = −2{`(X ; Θ)− `(X ; Θ̃)} as a loss:

minV∈Rp×k

D(X ; Θ̃VV>︸ ︷︷ ︸Θ̂

) = −2∑i,j

{xij θ̂ij − log(1 + exp(θ̂ij))

}

subject to V>V = Ik

Page 15: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Logistic PCA vs Logistic SVDI The previous logistic SVD (matrix factorization) gives an

approximation of logit P:

Θ̂LSVD = AB>

I Alternatively, our logistic PCA gives

Θ̂LSVD = Θ̃V︸︷︷︸A

V>,

which has much fewer parameters.

I Computation of PC scores on new data only requiresmatrix multiplication for logistic PCA while logistic SVDrequires fitting k -dimensional logistic regression for eachnew observation.

I Logistic SVD with additional A is prone to overfit.

Page 16: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Geometry of Logistic PCA

● ●

−5

0

5

−5 0 5θ1

θ 2

● ●

●●

0

1

0 1X1

X2

Probability

●●●

0.1

0.2

0.3

0.4

●●

●●

PCA

LPCA

Figure: Logistic PCA projection in the natural parameter space withm = 5 (left) and in the probability space (right) compared to the PCAprojection

Page 17: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

New Formulation of Generalized PCA

Landgraf and Lee (2015), Generalized PCA: Projection ofSaturated Model Parameters

I The idea can be applied to any exponential familydistribution (e.g. Poisson, multinomial).

I Find the best rank-k projection of the matrix of naturalparameters from the saturated model Θ̃ by minimizing theappropriate deviance for the data:

minV∈Rp×k

D(X ; Θ̃VV>)

subject to V>V = Ik

I If desired, main effects µ can be added to theapproximation of Θ:

Θ̂ = 1µ> + (Θ̃− 1µ>)VV>

Page 18: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

First-Order Optimality Conditions

I For the orthonormality constraint V>V = Ik , consider theLagrangian

L(V , µ,Λ) = D(X ; 1µ>+(Θ̃−1µ>)VV>)+tr(

Λ(V>V − Ik )),

where Λ is a k × k symmetric matrix of Lagrangemultipliers.

I Setting the gradient of L with respect to V , µ, and Λ equalto 0, we obtain the first-order optimality conditions forlogistic PCA:[

(X − P̂)>(Θ̃− 1µ>) + (Θ̃− 1µ>)>(X − P̂)]

V = V Λ

(Ip − VV>)(X − P̂)>1 = 0, and

V>V = Ik

Page 19: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

MM Algorithm

I Majorize the objective function with a simpler objective ateach iterate, and minimize the majorizing function.(Hunter and Lange, 2004)

I From the quadratic approximation of the deviance at Θ(t),step t solution, and the fact that p(1− p) ≤ 1/4,

D(X ; 1µ> + (Θ̃− 1µ>)VV>)

≤ 14‖1µ> + (Θ̃− 1µ>)VV> − Z (t+1)‖2F + C,

where Z (t+1) = Θ(t) + 4(X − P̂(t)).

I Update Θ at step t + 1:averaging for µ(t+1) given V (t) andeigen-analysis of a p × p matrix for V (t+1) given µ(t+1).

Page 20: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Medical Diagnosis Data

I Part of electronic health record data on 12,000 adultpatients admitted to the intensive care units (ICU) in OhioState University Medical Center from 2007 to 2010(provided by S. Hyun)

I Patients are classified as having one or more diseases ofover 800 disease categories from the InternationalClassification of Diseases (ICD-9).

I Interested in characterizing the co-morbidity as latentfactors, which can be used to define patient profiles forprediction of other clinical outcomes

I Analysis is based on a sample of 1,000 patients, whichreduced the number of disease categories to 584.

Page 21: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Patient-Diagnosis Matrix

Page 22: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Deviance Explained by Components

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●● ●

●●●

●●●●●●

●●●●●●●

●●●

●●

●●

●●●●●

Cumulative Marginal

0%

25%

50%

75%

0.0%

2.5%

5.0%

7.5%

0 10 20 30 0 10 20 30Number of Principal Components

% o

f Dev

ianc

e E

xpla

ined

LPCA

LSVD

PCA

Figure: Cumulative and marginal percent of deviance explained byprincipal components of LPCA, LSVD, and PCA

Page 23: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Deviance Explained by Parameters

0%

25%

50%

75%

0k 10k 20k 30k 40kNumber of free parameters

% o

f dev

ianc

e ex

plai

ned

LPCA

LSVD

PCA

Figure: Cumulative percent of deviance explained by principalcomponents of LPCA, LSVD, and PCA versus the number of freeparameters

Page 24: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Predictive Deviance

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

Cumulative Marginal

0%

10%

20%

30%

40%

0%

2%

4%

6%

0 10 20 30 0 10 20 30Number of principal components

% o

f pre

dict

ive

devi

ance

LPCA

PCA

Figure: Cumulative and marginal percent of predictive deviance overtest data (1,000 patients) by the principal components of LPCA andPCA

Page 25: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Interpretation of Loadings

●●

●● ●

●● ●

●●

●●

●● ● ●● ●

● ●●

● ● ●

● ●

●●

●●

●●

● ●

●●

● ● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●●

●● ●

● ●

●●

●●

●●

●●

● ●● ●

●●

● ●

● ●●● ●

●●

●●

●●

●●●

● ●

● ●● ●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●●●

● ●●

●● ● ●

●●

●● ●●

●●

● ●●● ● ● ●● ●●

●●●

●●●

● ●

● ●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●●

●●●

●●● ●

● ●● ●●●●

●●● ●

●●●●

● ● ●●●

●●

● ●●

● ●●● ●●●●

●●

● ●●●

●●● ●●

●●

●●

●●●

●●

●●●

● ●●●

●●

● ●● ●

●●

● ●●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●● ●

● ●●

●●

●●

●●

●● ●

● ●●

●●

●●

● ●

●●●

● ●●

● ●

● ●●●

●●

● ●

●●

●●

●●

●● ●●

● ●

●●

●● ●

●●

●●

● ●●●

●●

● ●

● ●

● ●

●●

●●

● ●

●●

●●

●● ●

●●● ● ●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

● ●● ●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●●

●●● ●●● ●● ● ●● ●● ● ●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

● ● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●● ●●● ●● ●

●●

●●●● ●

●●

●●

●● ●

●●●

●●

●●●

●●●

●●● ●

●●●●

● ●●

● ●

●●

● ●●

●●

●● ●

● ●●●

●● ●

● ●

01:Gram−neg septicemia NEC03:Hyperpotassemia

08:Acute respiratry failure

10:Acute kidney failure NOS16:Speech Disturbance NEC

17:Asphyxiation/strangulat

03:DMII wo cmp nt st uncntr

03:Hyperlipidemia NEC/NOS

04:Leukocytosis NOS

07:Mal hy kid w cr kid I−IV

07:Old myocardial infarct

07:Crnry athrscl natve vssl

07:Systolic hrt failure NOS

09:Oral soft tissue dis NEC

10:Chr kidney dis stage IIIV:Gastrostomy status

−0.2

0.0

0.2

Component 1 Component 2

Figure: The first component is characterized by common seriousconditions that bring patients to ICU, and the second component isdominated by diseases of the circulatory system (07’s).

Page 26: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Concluding Remarks

I We generalize PCA via projections of the naturalparameters of the saturated model using the generalizedlinear model framework.

I We have extended generalized PCA to handle differentialcase weights, missing data, and variable normalization.

I Further extensions are possible with other constraints thanrank for desirable properties (e.g. sparsity) on the loadingsand predictive formulations.

I R package, logisticPCA is available at CRAN andgeneralizedPCA is currently under development.

Page 27: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

Acknowledgments

Andrew Landgraf@ Battelle Memorial Institute

Sookyung Hyun and Cheryl Newton@ College of Nursing, OSU

DMS-1513566

Page 28: Generalized Principal Component Analysis · 2015-11-11 · Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters Yoonkyung

References

A. J. Landgraf and Y. Lee.Dimensionality reduction for binary data through the projection ofnatural parameters.Technical Report 890, Department of Statistics, The Ohio StateUniversity, 2015.Also available at arXiv:1510.06112.

A. J. Landgraf and Y. Lee.Generalized principal component analysis: Projection ofsaturated model parameters.Technical Report 892, Department of Statistics, The Ohio StateUniversity, 2015.