supervised sparse and functional principal …...supervised sparse and functional principal...

Supervised Sparse and Functional Principal Component Analysis

Li et al. (2015)

December 1, 2015

Supervised Sparse and Functional Principal Component Analysis 1

Topics

• functional PCA

•


Functional FPCA

0 100 200 300

0.02

0.03

0.04

0.05

0.06

0.07

1th PC: 89.161

valu

es

0 100 200 300−

0.05

0.00

0.05

2th PC: 8.436

valu

es

0 100 200 300

−0.

050.

000.

05

3th PC: 1.8

valu

es

0 100 200 300

−0.

050.

000.

050.

10

4th PC: 0.442

valu

es

• 35 cities(curves)

• 365 days of temperature measured

0 100 200 300

−30

−20

−10

010

20

days

tem

pSupervised Sparse and Functional Principal Component Analysis 3

Consider another data•


Consider another data

• patient arrival rate data (hourly)

• 417 consecutive days

• shall we perform FPCA using all thedata together? Could they be consideredas replicates?

• what we might lose if we analysis themseparately?

• we we might gain if we combine them?


The Main Problem: Row-rank model

• Xi (s) = Zi (s) + ei (s): functional data for (i)th sample

• rank-r functional PCA model:

Zi (s) = µ(s) +r∑

k=1

uikVk(s) = µ(s) + uT(i)V(s)

• u(i)(r × 1) score vector for (i) th sample

• V(s): collection of r loading functions

• uT(i)V(s): low rank approximation of the (i)th demeaned Xi (s)− µ(s)


SupSFPC model• Xi (s) = Zi (s) + ei (s): functional data for (i)th sample


Zi (s) = µ(s) +r∑

k=1

uikVk(s) = µ(s) + uT(i)V(s)

• y(i)(q × 1): supervision set; extra information available for the ith sample;high-dimensional

u(i) = β0 + BTy(i) + f(i)

• multivariate linear model

• β0(r × 1);B(q × r); f(i) ∼ MVN(0,Σf)

• β0 + BTy(i) variation in u(i) explained by y(i)• fbold(i) left over variation


Combining Rank-r model with SupSFPC Model• Xi (s) = Zi (s) + ei (s)


Zi (s) = µ(s) + uT(i)V(s)

• SupSFPC Model:u(i) = β0 + BTy(i) + f(i)

•Xi (s) = [µ(s) + βT

0 V(s)] + yT(i)BV(s) + [f(i)V(s) + ei (s)]

• [µ(s) + βT0 V(s)] intercept;

• yT(i)BV(s) fixed term

• [f(i)V(s) + ei (s)] random term

• Primary Interest: yT(i)BV(s) + f(i)V(s)


Other Assumptions

• SupSFPC Model:

Xi (s) = [µ(s) + βT0 V(s)] + yT(i)BV(s) + [f(i)V(s) + ei (s)]

• Primary Interest: yT(i)BV(s) + f(i)V(s)

• B and V(s) are potentially sparse

• B selection of important features in y

• V (s): the support 6= the entire domain S


What Li et al. (2015) is trying to do?

• estimating variation within X (s): V(s)

• by incorporating the information of y

• select important features in y that are most likely to drive the low-rank structure of X (s)

• allowing V(s) to by sparse and smooth


Revisit the Hospital rate data(no featureselection)V(s) yT(i)BV(s)


Application II (with feature selection)

• X (t): 542 genes (every 7 mins; 18 timepoints)

• y: ChiP-chip data (106 TFs)• Goal 1: understanding the underlying

expression patterns of cell cycle-relatedgenes

• Goal 2: identifying transcription factors(TFs) that regulate cell cycles


Goal 1: understanding the underlyingexpression patterns


Goal 2: identifying transcriptionfactors (TFs) that regulate cell cycles32 out of 106 TFs are selected


Estimation Details

Xi (s) = yT(i)BV(s) + [f(i)V(s) + ei (s)]

X = YBVT + FVT + E

• n sample size; p time points; r # of FPCs

• X (n × p); V (p × r); E (n × p); F (n × r)

•x(i) ∼ MVN(yT(i)BV,V

TΣfVT + σ2eI)

• Likelihood:

L(X) = −np

2log(2π)− n

2log det(VTΣfV

T + σ2eIp)

− 1

2Tr((X− YBVT )(VTΣfV

T + σ2eIp)(X− YBVT )T )


Imposing sparse and smooth structure

X = YBVT + FVT + E

• Likelihood:

L(X) = −np

2log(2π)− n

2log det(VTΣfV

T + σ2eIp)

− 1



•maxθL(X)− Pf (V)− Ps(V)− Ps(B)

• Pf (V) =∑r

k=1 αkvTk Ωvk roughness penalty

• Ps(V) =∑r

k=1 λk ||vk ||1,Ps(B) =∑r

k=1 γk ||bk ||1 sparsity

• EM algorithm


Identifiability

X = YBVT + FVT + E

• Q r × r orthogonal

• BVT = BQQTVT , FVT = FQQTVT

• (1)∫Vi (s)Vj(s)ds = 0 or 1

• (2) Σf is diagonal with distinct positive eigenvalues

• (3) diagonal of Σf are strictly decreasing

• Challenge is to minimize L(X) under these constraints


Challenges in Computing

L(X) = −np

2log(2π)− n

2log det(VTΣfV

T + σ2eIp)

− 1



• non-differentiable for the sparsity penalties

• non-convex feasible region determined by identifiability constraints

• V shared by the mean and the covariance terms


EM algorithm

X = YBVT + FVT + E

X = UVT + E

U = YB + F

• L(X,U) = L(X|U) + L(U)

• L(X|U) ≈ −np log σ2e − σ−2e Tr

[(X−UVT )(X −UVT )T

]• L(U) ≈ −n log det Σf − Tr

[(U− YB)Σ−1

f (U− YB)T ]


EM algorithm

X = UVT + E

U = YB + F

• L(X,U) = L(X|U) + L(U)

• L(X|U) depends on σ2e ,V

• L(U) depends on B,Y


EM algorithm

X = UVT + E U = YB + F


Estimation of V

• Challenge: the orthogonality constraints of V

• Optimizing one column by one column of V

• a block coordinate decent algorithm

• eventually the orthogonality is maintained (simulation 85, yeast cell data, 87.7)


Reference

Gen Li, Haipeng Shen, and Jianhua Z Huang. Supervisedsparse and functional principal component analysis.

Journal of Computational and Graphical Statistics,(just-accepted):00, 2015.


supervised sparse and functional principal …...supervised sparse and functional principal...

Documents