in factor analysis 1 bi-cross-validation for factor...

43
Picking k in factor analysis 1 Bi-cross-validation for factor analysis Art B. Owen, Stanford University and Jingshu Wang, Stanford University To appear: Statistical Science (2016) UBC Statistics Dept, January 19, 2016

Upload: others

Post on 07-Nov-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 1

Bi-cross-validation for factor

analysis

Art B. Owen, Stanford University

and

Jingshu Wang, Stanford University

To appear: Statistical Science (2016)

UBC Statistics Dept, January 19, 2016

Page 2: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 2

I presented these slides at a van Eeden lecture at the University of British

Columbia on January 19, 2016. During the visit I had many wonderful

conversations with their graduate students who are engaged in a variety of

interesting problems.

UBC Statistics Dept, January 19, 2016

Page 3: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 3

Post-talk annotation: The original motivation for this work is in the removal of

unwanted variation from microarray experiments. There one has expression for

genes i = 1, . . . , N measured on subjects j = 1, . . . , n with n� N . The

model is

Yij = γigj + βTi xj + uTi vj + σiεij

for a primary variable gi of interest (e.g., age or treatment status), regression

covariates xj with coefficients βj and latent (unmeasured quantities)

ui, vj ∈ Rk with unknown k. The goal is to learn which γi are non-zero. The

latent variables induce correlations among test statistics. What is worse, if part of

vj is correlated with gj then each gene’s regression has a missing predictor

correlated with g. Interestingly, having that problem N times is better than having

it 1 time because there is a low rank UV T matrix that one can estimate and

remove (along with βXT).

UBC Statistics Dept, January 19, 2016

Page 4: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 4

Post-talk annotation: A terrific idea of Yunting Sun is to remove γgT from

Y ∈ RN×n via Householder rotation leaving a matrix in RN×(n−1) that has the

latent variable and the regressors and the noise, but is missing the primary

variable. In that case it is possible to estimate the latent variables via a model with

regression plus factor analysis. Then go back and estimate γ, assuming sparsity.

What is so cool about Yunting’s approach is that one would naturally try removing

the factors. That we are better off removing the wanted signal first is a critical

insight.

See the LEAPP paper by Sun, Zhang and Owen (2012) Annals of Applied

Statistics.

This talk is based on Wang and Owen (2016) Statistical Science, to appear, and it

focusses on choosing k for factor analysis.

The method is now incorporated in a preprint by Wang, Zhao, Hastie & Owen

entitled ‘Confounder adjustment in multiple hypothesis testing’, which also

focusses on identifiability issues.

UBC Statistics Dept, January 19, 2016

Page 5: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 5

Summary• Factor analysis is over 100 years old Spearman (1904)

• It is widely used (economics, finance, psychology, engineering · · · )

• It remains problematic to choose the number of factors k

• Best methods justified by simulation (e.g., parallel analysis)

• Recent random matrix theory (RMT) applies to homoscedastic case

Our contribution

1) We hold out some rows and some columns

2) Fit a k-factor model to held-in data

3) Predict held-out data to choose k

4) Justify via simulations

5) RMT informs the simulation design

6) Proposed method dominates recent proposalsUBC Statistics Dept, January 19, 2016

Page 6: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 6

Factor analysis model

Y = X + Σ1/2E ∈ RN×n

X = LRT, or UDV T (svd)

L ∈ RN×k

R ∈ Rn×k

E ∼ N (0, IN ⊗ In)

Σ = diag(σ21 , . . . , σ

2N )

For example

Y has expression of N genes for n subjects.

Should not model n→∞ with N fixed.

UBC Statistics Dept, January 19, 2016

Page 7: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 7

Our criterion

Y = X + Σ1/2E

There is a true X . Choosing k yields X = Xk, and

The best k minimizes

‖X − Xk‖2F =∑ij

(Xk,ij −Xij)2

True rank

The true rank of X is not necessarily the best k.

The true # factors may not even be a meaningful quantity.

There could be more than min(n,N) real phenomena.

Maybe k =∞.

UBC Statistics Dept, January 19, 2016

Page 8: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 8

Principal componentsUse y ∈ RN for one observation, i.e., one column of Y .

Principal components

Var(y) = PΛPT =∑j

λjpjpTj truncate to k terms

Factor analysis

Y = X + ΣE = LRT + ΣE

Var(y) = LLT + Σ if rows of R ∼ N (0, Ik)

Sometimes the principal components solution is used for factor analysis or as a

starting point.

Sample Var(y) not necessarily low rank plus diagonal.

UBC Statistics Dept, January 19, 2016

Page 9: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 9

Choosing k in factor analysis• try several and use ‘WOW’ criterion∗ Johnson and Wichern (1988)

• plot sorted eigenvalues of Var(y) and cut at an ‘elbow’ · · · scree plot

• standardize so that Var(y) is a correlation matrix. Keep any λj > 1

Kaiser (1960)

• hypothesis tests H0,k:rank = k vs HA,k:rank > k

• parallel analysis Horn (1965) and Buja & Eyoboglu (1992) more later

Estimating L and R or just X = LRT

This is tricky even for fixed k.

We want to recover X and the likelihood from Y = X + ΣE is unbounded.

(as σi → 0)

Other approaches fit Var(y).= rank k + diagonal

∗as in “Wow, I understand these factors.”

UBC Statistics Dept, January 19, 2016

Page 10: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 10

Classical asymptoticsFix dimension N of y.

Let number n of observations→∞.

Work out asymptotically valid hypothesis tests.

Not convincing when N and n are comparable, or N � n

Also

Why approach an estimation problem via testing?

UBC Statistics Dept, January 19, 2016

Page 11: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 11

Random matrix theoryApplies more directly to principal components

Survey by Paul & Aue (2014)

Suppose that Y ∼ N (0, IN ⊗ In)

All true eigenvalues of Var(y) are 1. Biggest is biased up.

Marcenko-Pastur (1967)

Let n,N →∞ with N/n→ γ ∈ (0, 1) Var(y) = n−1Y Y T

max eigVar→M ≡ (1 +√γ)2 & min eigVar→ m ≡ (1−√γ)2

Limiting density is

fγ(λ) =

√(M − λ)(λ−m)

2πγλ, m 6 λ 6M

For γ > 1

Get a lump at 0 and the f1/γ density

UBC Statistics Dept, January 19, 2016

Page 12: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 12

Tracy-WidomMarcenko-Pastur describes the ‘bulk spectrum’.

Tracy & Widom (1994, 1996) find the distribution of max eigenvalue for (related)

Wigner matrix.

Johnstone (2001) finds distn of max eig (nVar)

Again N/n→ γ ∈ (0, 1]

The limit

λ1 − µn,Nσn,N

d→W1 Tracy-Widom density

µn,N = (√n− 1 +

√N)2 center

σn,N = (√n− 1 +

√N)( 1√

n− 1+

1√N

)1/3scale

Then σn,N/µn,N = O(n−2/3) UBC Statistics Dept, January 19, 2016

Page 13: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 13

Spiked covariance

Var(y) =k∑j=1

djujuTj + σ2IN , uj orthonormal

d1 > d2 > · · · > dk > 0, k fixed

Theorem for σ2 = 1, N/n→ γ

Var(y) ∈ RN×N has eigenvalues

`1 > `2 > . . . > `k > 1 = `k+1 = · · · = `min(n,N)

Var(y) has eigenvalues ˆj . Then

ˆja.s.→

(1 +√γ)2, `j 6 1 +

√γ

`j

(1 + γ

`j−1

), else.

Consequence dj 6√γ undetectable (via svd)

Don’t need Gaussianity.UBC Statistics Dept, January 19, 2016

Page 14: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 14

Parallel analysisThe biggest eigenvalue is biased up.

Horn (1965) simulates Gaussian data with no factors.

He tracks the largest singular values

Takes k > 1 if observed d1 attains a high quantile

Goes on to the second largest . . .

Permutations

Buja & Eyoboglu (1992) adapt to non Gaussian data.

They permute all N of the variables

Uniform over n! permutations of each.

Upshot

They’re wise to the bias. Horn’s article came out before Marcenko-Pastur.

Tricky consequences from using for example 95’th quantile:

One huge factor might make it hard to detect some other real ones.

Also · · · why 0.95?UBC Statistics Dept, January 19, 2016

Page 15: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 15

Bi-cross-validation

Y = X + E.

Seek k so that Yk (truncated SVD) is close to underlying X

Derived for E with all variances equal.

We will adapt it to Σ1/2E.

Literature

O & Perry (2009) Ann. Stat.

Perry (2009) Thesis

UBC Statistics Dept, January 19, 2016

Page 16: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 16

Bi-cross-validation

Y =

A B

C D

• Take Y ∈ Rm×n

• Hold out A ∈ Rr×s

• Fit SVD to D ∈ Rm−r×n−s

• Cut at k terms D(k)

• A = B(D(k))+C (pseudo BD−1C)

• Repeat for (m/r) × (n/s)

blocks

• Sum squared err ‖A−A‖2

Randomize the order of the rows and columns.

UBC Statistics Dept, January 19, 2016

Page 17: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 17

Toy example

Y =

1 2

3 6

A−BD+C = 1− 26−13 = 0

UBC Statistics Dept, January 19, 2016

Page 18: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 18

Self-consistency lemmaO & Perry (2009)

Suppose that Y =

A B

C D

has rank k and so does D. Then

A = BD+C = B(D(k))+C

where D+ is the Moore-Penrose generalized inverse.

This justifies treating A−B(D(k))+C as a residual from rank k.

We could also replace B, C by B(k), C(k)

(but it doesn’t seem to work as well)

Truncated SVD, also non-negative matrix factorization

(unpublished: k-means)

UBC Statistics Dept, January 19, 2016

Page 19: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 19

Perry’s 2009 thesisRecall the

√γ spike detection threshold.

If djujuTj is barely detectable then dj uj u

Tj is a bad estimate of djuju

Tj

It makes X worse (better to use j − 1 terms).

The second threshold

Perry found a second threshold. Above that, including dj uj uTj helps.

Also found by Gavish & Donoho (2014)

dj >1 + γ

2+

√(1 + γ

2

)2+ 3γ

UBC Statistics Dept, January 19, 2016

Page 20: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 20

Optimal holdout sizeY =

√nUDV T + σE Fixed D = diag(d1, . . . , dk0).

N/n→ γ ∈ (0,∞) UTU = V TV = Ik0

Hold out n0 of n columns and N0 of N rows

Oracle picks k∗ BCV gives PEk we take k = arg mink PEk

Result

Let

γ =(γ1/2 + γ−1/2

2

)2and ρ =

n− n0n

× N −N0

N

Then k∗ − arg mink E(PEk)→ 0 if

√ρ =

√2

√γ +√γ + 3

For example: for n = N hold out about 53% of the rows and cols

NB: held out aspect ratio does not have to match data’s

Perry (2009) UBC Statistics Dept, January 19, 2016

Page 21: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 21

Unequal variances

Y = X + Σ1/2E = LRT + Σ1/2E

Y00 Y01

Y10 Y11

=

L0

L1

R0

R1

T

+

Σ1/20 0

0 Σ1/21

E00 E01

E10 E11

=

L0RT0 L0R

T1

L1RT0 L1R

T1

+

Σ1/20 E00 Σ

1/20 E01

Σ1/21 E10 Σ

1/21 E11

Held-in data (lower right corner)

Y11 = X11 + Σ1/211 E11 = L1R

T1 + Σ

1/211 E11

UBC Statistics Dept, January 19, 2016

Page 22: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 22

Hold-out estimationFrom held in data

Y11 = L1RT1 + Σ

1/211 E11

we get L1, R1, and Σ1 (how to do that later). But we need X00 = L0RT0

Least squares to get L0

Y T01

.= R1L

T0 + ET

01Σ1/20

L0 = Y01R1(RT1 R1)−1

Weighted least squares to get R0

Y10.= L1R

T0 + Σ

1/21 E10

RT0 = (LT

1 Σ−11 L1)−1LT

1 Σ−11 Y10

Held-out estimate

X00 = L0RT0

X00 is unique even though factorization X11 = L1RT1 is not.

UBC Statistics Dept, January 19, 2016

Page 23: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 23

Solving the hold-in problem

Y = LRT + Σ1/2E, L ∈ RN×k, R ∈ Rn×k

Likelihood

logL(X,Σ) = −Nn2

log(2π)−n2

log det Σ+tr[−1

2Σ−1(Y−X)(Y−X)T

]Max over Σ given X

Σ = n−1diag[(Y − X)(Y − X)T]

Max over rank k X given Σ

Y = Σ−1/2X, X = Σ1/2 × truncated SVD of Y

Except that

It doesn’t work. The likelihood is unbounded. Diverges as some σi ↓ 0

UBC Statistics Dept, January 19, 2016

Page 24: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 24

Making it workConstrain σi > ε > 0

Then you’ll get back some σi = ε. So choice of ε is hard.

Place a prior on σi

SImilar story. Delicate to make the prior strong enough but not too strong. Also: it

didn’t perform well even using the true prior.

Place a regularizing penalty on Σ

Then we would need a cross-validation step in the middle of our cross-validation

procedure.

Regularization by early stopping

Do m iterations and then stop. Extensive simulations show that m = 3 is nearly

as good as an oracle’s m. Also it is faster than trying to iterate to convergence.

This seemed to be the biggest problem we faced.

UBC Statistics Dept, January 19, 2016

Page 25: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 25

The algorithmUse Perry’s recommended hold-out fraction

Make the held out matrix nearly square

Predict the held out values

Repeat the cross-validation

Retain the winning k

UBC Statistics Dept, January 19, 2016

Page 26: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 26

Extensive simulationsWe generated a battery of test cases of different sizes, aspect ratios, and

mixtures of factor strengths.

Factor strengths

X =√nUDV T

• Undetectable. dj <√γ.

• Harmful. Above the detection threshold but below the second threshold.

• Helpful. Above both thresholds.

• Strong. A factor explains a fixed % of variance of the data matrix. di ∝√n.

Weak factors have fixed di as n→∞They explain O(1/n) of the total variance.

Strong factors are prevalent, but sometimes less interesting since we may already

know about them.

UBC Statistics Dept, January 19, 2016

Page 27: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 27

MethodsComparing to recent proposals in econometrics.

1) Parallel analysis via permutations Buja & Eyuboglu (1992)

2) Eigenvalue difference method Onatski (2010)

k = max{k 6 kmax | λ2k − λ2k+1 > δ}

3) Eigenvalue ratio method Ahn & Horenstein (2013)

k = arg max{0 6 k 6 kmax | λ2k/λ2k+1}λ20 ≡

∑i>1 λ

2i / log(min(n,N))

4) Information criterion Bai & Ng (2002)

arg mink log(V (k)) + k(n+NnN

)log(nNn+N

)V (k) = ‖Y − Yk‖2F /(nN)

5) Weak factor estimator of Nadakuditi & Edelman (2008)

Also

The true rank, and an oracle that always picks the best rank for each data set.UBC Statistics Dept, January 19, 2016

Page 28: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 28

Noise levels

Y = X + Σ1/2E E ∼ N (0, IN ⊗ In) Σ = diag(σ21 , . . . , σ

2N )

Random inverse gamma variances

E(σ2i ) = 1 and Var(σ2

i ) ∈ {0, 1, 10}

Homoscedastic as well as mild and strong heteroscedastic.

UBC Statistics Dept, January 19, 2016

Page 29: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 29

Factor scenariosScenario

1 2 3 4 5 6

# Undetectable 1 1 1 1 1 1

# Harmful 1 1 1 3 3 6

# Useful 6 4 3 1 3 1

# Strong 0 2 3 3 1 0

True rank is 8

Always one undetectable.

Vary the number of harmful, 1 to 6.

Vary the mix useful/strong among the non-harmful.

Some algorithms have trouble with strong factors.

UBC Statistics Dept, January 19, 2016

Page 30: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 30

10 sample sizes, 5 aspect ratiosN n γ

20 1000 1/50

20 100 1/5

50 50 1

100 20 5

1000 20 50

100 5000 1/50

200 1000 1/5

500 500 1

1000 200 5

5000 100 50

UBC Statistics Dept, January 19, 2016

Page 31: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 31

Random signal

Y =√nUDV T + Σ1/2E, versus

Y = Σ1/2(√nUDV T + E

)Second version supports use of RMT to judge size (undetectable, harmful,

helpful, strong).

The signal X

Diagonals of D spread through range given by factor strength.

U∗ and V uniform on Stiefel manifolds.

Solve Σ−1/2U∗DV T = UDV T for U .

Deliver X =√nΣ1/2UDV T

The point is to avoid having large σ2i detectable from large variane of row i of Y ,

making the problem too easy.

UBC Statistics Dept, January 19, 2016

Page 32: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 32

Relative estimation error (REE)

REE(k) =‖X(k)−X‖2F‖X(k∗)−X‖2F

− 1

Here k∗ is the rank that attains the best reconstruction of X .

REE measures relative error compared to an oracle. The oracle uses early

stopping but can try every rank and see which is closest to truth.

Cases

6 factor scenarios× 3 noise scenarios× 10 sample sizes = 180 cases,

each done 100 times

UBC Statistics Dept, January 19, 2016

Page 33: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 33

Worst case REEvar(σ2

i ) PA ED ER IC1 NE BCV

0 1.99 1.41 49.61 1.13 0.12 0.29

1 2.89 2.42 25.02 3.11 2.45 0.37

10 3.66 2.28 15.62 4.46 2.10 0.62

For each of 6 methods and 3 noise levels we find the worst REE. The average is

over 100 trials. The maximum is over the 10 sample sizes and 6 factor strength

scenarios.

NE is best on problems with constant error variance.

BCV is best overall.

UBC Statistics Dept, January 19, 2016

Page 34: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 34

REE survival plots

0 1 2 3 4 5

●●

● ●0%

20%

40%

60%

80%

100% ● PAEDERIC1NEBCV

UBC Statistics Dept, January 19, 2016

Page 35: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 35

Small data sets only

0 1 2 3 4 5

● ● ● ●0%

20%

40%

60%

80%

100% ● PAEDERIC1NEBCV

UBC Statistics Dept, January 19, 2016

Page 36: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 36

Large data sets only

0 1 2 3 4 5

● ●0%

20%

40%

60%

80%

100% ● PAEDERIC1NEBCV

UBC Statistics Dept, January 19, 2016

Page 37: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 37

Rank distribution example

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−1: 0/6/1/1

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−2: 2/4/1/1

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−3: 3/3/1/1

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−4: 3/1/3/1

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−5: 1/3/3/1

True PA ED ER IC1 NE BCVOracle

0

2

4

6

8

10

Type−6: 0/1/6/1

N = 5000 n = 100 Var(σ2i ) = 1

UBC Statistics Dept, January 19, 2016

Page 38: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 38

Meteorite data64× 128 = 8192 pixel map of a meteorite

Concentraion of 15 elements measured at each pixel.

Factors≈ chemical compounds.

We center each of 15 variables.

BCV chooses k = 4 while PA chooses k = 3.

UBC Statistics Dept, January 19, 2016

Page 39: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 39

BCV vs k

●●

● ●

200

250

300

350

400

450

500

rank k

BC

V P

redi

ctio

n E

rror

0 1 2 3 4 5 6 7 8 9

UBC Statistics Dept, January 19, 2016

Page 40: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 40

Early stopping vs SVD

1:128

ESA_F1

1:128

ESA_F2

1:128

ESA_F3

1:128

ESA_F4

1:128

1:64

SVD_F1

1:128

1:64

SVD_F2

1:128

1:64

SVD_F3

1:128

1:64

SVD_F4

1:64

SVD_F5

1:128

1:64

scale_F1

1:128

1:64

scale_F2

1:128

1:64

scale_F3

1:128

1:64

scale_F4

1:64

scale_F5

UBC Statistics Dept, January 19, 2016

Page 41: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 41

ESA vs SVD (k-means)

−0.01 0.01 0.03

−0.

020.

000.

020.

04

F1

F2

ESA

−0.02 0.00 0.01 0.02 0.03

−0.

020.

000.

020.

04

F1

F2

SVD

−0.02 0.00 0.02 0.04

−0.

15−

0.05

0.00

F1

F2

scale

(a)

1:128

1:64

ESA_C1

1:128

1:64

ESA_C2

1:128

1:64

ESA_C3

1:128

1:64

ESA_C4

1:128

1:64

ESA_C5

(b)

1:128

1:64

SVD_C1

1:128

1:64

SVD_C2

1:128

1:64

SVD_C3

1:128

1:64

SVD_C4

1:128

1:64

SVD_C5

(c)UBC Statistics Dept, January 19, 2016

Page 42: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 42

Early stopping: why 3?

MeasurementsWhite Noise Heteroscedastic Noise

Var(σ2i ) = 0 Var(σ2

i ) = 1 Var(σ2i ) = 10

ErrX(m=3)ErrX(m=mOpt)

1.01± 0.01 1.00± 0.01 1.00± 0.01

ErrX(m=3)ErrX(m=1) 0.93± 0.09 0.90± 0.11 0.89± 0.12

ErrX(m=3)ErrX(m=50) 0.87± 0.21 0.87± 0.21 0.87± 0.21

ErrX(m=3)ErrX(SVD) 1.03± 0.06 0.81± 0.20 0.75± 0.22

ErrX(m=3)ErrX(QMLE) 1.02± 0.05 0.95± 0.15 0.91± 0.19

ErrX(m=3)ErrX(oSVD) 1.03± 0.06 1.03± 0.07 1.03± 0.08

Table 1: Average and standard deviation over 6000 simulations. PCA hasm = 1.

QMLE.= EM. oSVD=SVD for oracle given Σ. Oracle’s k in numer. and denom.

Five is right out. M. Python (1975)UBC Statistics Dept, January 19, 2016

Page 43: in factor analysis 1 Bi-cross-validation for factor analysismathtube.org/sites/default/files/slides/ubc2016-annotated.pdfRecent random matrix theory (RMT) applies to homoscedastic

Picking k in factor analysis 43

Thanks• Co-author Jingshu Wang

• Data: Ray Browning

• NSF for funds

• UBC Statistics

• Ali Hauschildt & Peggy Ng

• Tanja Hoegg & Seagle Liu

UBC Statistics Dept, January 19, 2016