exploratory data analysismavdwiel/shdd/pcaclustshrinkage.pdfexploratory analysis i: hierarchical...

62
Mark van de Wiel 1 [email protected] Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University High-dimensional data: Exploratory data analysis 1 Contributions by Wessel van Wieringen

Upload: others

Post on 31-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Mark van de Wiel1 [email protected]

Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

High-dimensional data: Exploratory data analysis

1Contributions by Wessel van Wieringen

Page 2: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Intro High-Dimensional Data

Page 3: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

High-dimensional data: Definition

Examples

•Genomics data. E.g. measurements on all human genes, p=25,000, for say n=100 individuals

•Imaging data (fMRI). Thousands or millions of pixels (or voxels) for hundreds of individuals

•Astronomy. Terabytes of data for a limited number of galaxies / black holes / etc.

Data for which the number of variables, p, exceeds the number of observations, n.

High-dimensional data:

Page 4: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

High-dimensional data: practical challenges

Inference Which genes are differentially expressed1 between cancer and normally tissue?

Visualization How to visualize high-dimensional observations (samples) and discover subgroups?

Prediction Probability of tumor recurrence given the genomic profile of the primary tumor (baseline)

Functional relationships Which brain regions interact functionally?

1‘Expression’ refers to a (relative) quantification of the gene in a cell/tissue/sample

Page 5: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

High-dimensional data: statistical challenges

Inference Statistical models; Multiple testing; shrinkage: borrowing information across features

Visualization Clustering, principle component analysis

Prediction Fitting ordinary regression models is not feasible: penalized regression; machine learning approaches

Functional relationships Construction on networks which describe such relationships

Page 6: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Exploratory analysis I: Hierarchical clustering1

1Slides: Wessel van Wieringen

Page 7: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Cluster analysis seeks • meaningful data-determined groupings of samples, s.t.

• samples are more “similar” within than across groups,

• this similarity in gene expression profiles is assumed to imply some form of phenotypic similarity of the samples.

Objective of cluster analysis

Cluster analysis is also known as: unsupervised learning, unsupervised classification, class discovery, and data segmentation

Hierarchical clustering

Page 8: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

Hierarchical clustering produces a nested sequence of clusters. It start with all objects apart, and at each step two clusters are merged until only one is left.

The nested sequence can be represented by a dendogram.

A dendogram is a two-dimension diagram, a tree. Each fusion of clusters is plotted at a height equal to the dissimilarity of the two clusters which are joined.

Page 9: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Building a dendogram (loosely): • Find samples that have most similar gene expression

profiles.

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Page 10: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Building a dendogram (loosely): • Samples 1 and 3 have most similar gene expression

profiles. Let these samples form a cluster. • Repeat this exercise.

Page 11: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Building a dendogram (loosely): • Look for samples or clusters that have most similar gene

expression profiles.

Page 12: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Building a dendogram (loosely): • New clusters may form: samples 2 and 6.

Page 13: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Building a dendogram (loosely).

Page 14: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

expression gene 1

expr

essi

on g

ene

2

1 4

3

6 5

2

sample 2

sample 6

sample 5

sample 4

sample 3

sample 1

Building a dendogram (loosely): • Finally, all samples/clusters are

merged into one big cluster.

Page 15: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

A dendrogram is often used in combination with heatmap. A heat map is a graph-ical representation of data where the values taken by a variable in a two-dimensional map are represented as colors.

Heatmap

-5 0 5

Heatmap of a expression matrix

Page 16: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

Heatmap Expression matrix

-5 0 5

S_1 S_i S_50

g_1 -1.3 … 0.1 … -1.2 g_2 -0.1 … 2.4 … 0.3 … … … … … … … … … … … … g_j 0.4 … 1.5 … -0.2 … … … … … … … … … … … … … … … … … … … … … … … … g_846 -0.9 … -0.8 … 0.4

expression of gene j in sample i

Page 17: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Hierarchical clustering

Visualization of hierarchical clustering results: dendrogram and heatmap combined

samples

gene

s

Page 18: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

• Genes that cluster together are believed to be functionally related (modules / pathway / GO node).

• This may help to characterize unknown genes.

Hierarchical clustering of genes

Hierarchical clustering

samples

gene

s

May also cluster samples and genes simultaneously.

Page 19: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Central to cluster analysis is the notion of distance (or dissimilarity) between objects being clustered.

Distance

Distance measures take on values between 0 and ∞: • 0 reflects maximum similarity between two samples, • ∞ means that two samples are not similar at all, and • values inbetween indicate various degrees of

resemblance.

Hierarchical clustering

Page 20: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

• Euclidean distance:

• Manhattan distance:

• (Pearson) correlation:

Some distance measures (for continuous data)

Hierarchical clustering

Data: Yij, column vectors (samples): Y.j

∑ p

1i

2ikijk.j.E )Y-Y()Y,Y(d

==

∑ p

1i ikijk.j.M |Y-Y|)Y,Y(d=

=

∑ ∑∑p

1i

p

1i

2k.ik

2j.ij

p

1i k.ikj.ij

k.j.C

)Y-Y()Y-Y(

)Y-Y)(Y-Y()Y,Y(d

= =

==

Page 21: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Distance measures are defined between two samples.

In hierarchical clustering, also the distance between groups of samples (clusters) needs to be assessed.

Linkage tells us how to do that.

Distance between clusters

Cluster A Cluster B

Hierarchical clustering

Page 22: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Cluster A Cluster B

Single linkage

Average linkage

Complete linkage

Minimum distance

Average distance

Maximum distance

Hierarchical clustering

Page 23: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Complete yields a more compact clustering.

Complete

Single Average

Effects of linkage

Hierarchical clustering

Page 24: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Exploratory analysis II: Principle Component

Analysis (PCA)

Page 25: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

……

……

……

……

……

……

……

……

0.8

-0.7

0.6

0.9

0.4

-1.1

1.8

-0.5

1.0

0.8

0.1

0.3

1.2

-0.6

1.6

0.8

1.7

0.2

-0.5

0.1

0.5

0.9

1.5

0.4

-1.0

0.4

0.2

0.8

0.2

-0.8

0.5

1.2

-0.2

0.4

2.0

1.5

1.3

0.6

-0.3

0.2

Samples/Individuals, e.g n=8

Feat

ures

/Gen

es, e

.g. p

=25

000

Group 1 Group 2

Now suppose we pretend to not see the groups ....

Page 26: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

……

……

……

……

……

……

……

……

0.8

-0.7

0.6

0.9

0.4

-1.1

1.8

-0.5

1.0

0.8

0.1

0.3

1.2

-0.6

1.6

0.8

1.7

0.2

-0.5

0.1

0.5

0.9

1.5

0.4

-1.0

0.4

0.2

0.8

0.2

-0.8

0.5

1.2

-0.2

0.4

2.0

1.5

1.3

0.6

-0.3

0.2

Samples/Individuals, e.g n=8

Feat

ures

/Gen

es, e

.g. p

=25

000

Challenge: if the genomics data is relevant for the underlying grouping we should be able to observe this after visualization. Solution 1: Clustering (but depends a lot on ad hoc choices...) Solution 2: Principle Component Analysis (PCA)

Page 27: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA) Two-gene world

Gene 1

Gene

2

Gene 1

Gene

2

But how to obtain a similar visualisation for gene dimension p=25,000?

Page 28: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

Yij: data kth PC: Zj

k : linear combination First PC: kth PC: as above, but additional orthogonality constraint:

Principle components (PC)

∑p

1i ijki Yw

=

1)w(w.t.s

)Yw(Varmaxargp

1i

21i2

1

p

1i ij1iw

===

=

∑∑

1k-,...,1h,0w.w)II

1)w(w)I.t.s

)Yw(Varmaxarg

hk

p

1i

2ki2

k

p

1i ijk

w ik

==

===

=

∑∑

Page 29: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

1ww.t.s),wΣwmax(

wΣw)Yw(VarT

pxpT

pxpTp

1i iji

=

==

wλwΣ0wλ2-Σw20dw

)w(dL

)1w-ww-(λΣw)w(L

pxpTT

pxpT

Tpxp

T

===

=

∀∀

Introduce Lagrange multiplier to deal with constraint:

Eigenvectors z= w are the solutions, zmax corresponding to maximum eigenvalue λmax renders global maximum (simply substitute: Σpxp= λ).

Page 30: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

Required: eigenvalues of Σ and orthonomal eigenvectors z. Solution: singular value decomposition (SVD). First1,

Efficient computation of PCs (1)

pxpTT

T

)1n(=YY=XXnxp=X,Y=X

∑-

Then, SVD is a factorisation of X into U: orthonormal nxn matrix, D: rectangular nxp diagonal matrix2 , W: orthonormal pxp matrix

2matrix consisting of (p-n) 0n-columns and nxn diagonal matrix

T2TT WWD=XX⇒UDW=X

1Assume wlog each column of Y is centered: mean(Y) = 0

Page 31: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

Efficient computation of PCs (2)

T2TT WWD=XX⇒UDW=X

The latter is the eigenvalue decomposition of the symmetric pxp matrix XTX. Problem: p is large. However:

T2TTTTTTT U)D(U=YYUWD=)UDW(=X=Y ∀

YTY is of dimension nxn, n is small. So solution:

1. Eigenvalue decompostion1 of YTY renders D and U 2. YU = WDT or kth PC: column W.k = [YU].k / Dkk , where k

corresponds to the kth largest eigenvalue.

1using standard algorithms for finding eigenvalues and eigenvectors

Page 32: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

Yij: data kth PC: Zj

k : linear combination First PC: kth PC: as above, but additional orthogonality constraint:

Principle components (PC)

∑p

1i ijki Yw

=

1)w(w.t.s

)Yw(Varmaxargp

1i

21i2

1

p

1i ij1iw

===

=

∑∑

1k-,...,1h,0w.w)II

1)w(w)I.t.s

)Yw(Varmaxarg

hk

p

1i

2ki2

k

p

1i ijk

w ik

==

===

=

∑∑

Page 33: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA)

• kth principle component for j: PCk(j) = where Yij: data for individual j

• Plot PC1(j) vs PC2(j) for all individuals j=1, ..., n

• In words: For each individuals plot the coordinates of those two orthogonal summaries of the p-dimensional data that explain most of variation between individuals

If a group label associates strongly with the p-dimensional data one may expect to observe that the groups are separated by (a combination of) the two PCs

Visualization

∑p

1i ijki Yw

=

Page 34: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Principle component analysis (PCA) Application: Colon Cancer. Black: Healthy colon tissue Green: Tumor colon tissue Measurements: ~ 2000 microRNA1 expressions

1 small pieces of mRNA, which can degrade mRNA genes

Page 35: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Mark van de Wiel [email protected]

Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

Efficient parameter estimation in p models: shrinkage1

1Many slides courtesy of Wessel van Wieringen

Page 36: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Data: Setting

……

……

……

……

……

……

……

……

0.8

-0.7

0.6

0.9

0.4

-1.1

1.8

-0.5

1.0

0.8

0.1

0.3

1.2

-0.6

1.6

0.8

1.7

0.2

-0.5

0.1

0.5

0.9

1.5

0.4

-1.0

0.4

0.2

0.8

0.2

-0.8

0.5

1.2

-0.2

0.4

2.0

1.5

1.3

0.6

-0.3

0.2

Samples/Individuals, e.g n=8

Feat

ures

/Gen

es, e

.g. p

=25

000

Group 1 Group 2

Model per gene : Xij = β0j + β1j Zi (i=sample, j=gene, Zi = 0 when sample i in group 1 and 1, otherwise) How to efficiently estimate β1j?

XT =

Page 37: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

Page 38: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

JS example Let Xi = (Xi1, …, Xip)T be a p-variate normally distributed random variable:

The mean vector μ is estimated from a random sample of size n using the quadratic loss function:

Then, the least squares (LS) estimate of μ:

Page 39: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

Recall, in general, for any estimator of μ:

JS example (continued) The (total) mean squared error (MSE) of this estimator:

for independent Xj’s. This does not involve μ.

Hence, the MSE is a measure of the quality of the estimator.

Page 40: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

The James-Stein (JS) estimator is an estimator that outperforms the ML estimator, in the sense that it yields a smaller MSE.

The JS estimator is of the form:

where • is the original estimator,

• is a target estimator, and

• is the shrinkage parameter, which determines how much the two estimators are pooled.

Page 41: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator The JS estimator is of the form:

e.g. :sample variance for a given gene and

:a pooled variance estimate across all genes

λ Estimate Pooled estimate

Shrinkage Estimate

0 2 1 2

0.25 2 1 1.75

0.5 2 1 1.5

1 2 1 1

Page 42: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

The MSE of the JS estimator can be expressed as

This is a parabola in , whose parameters are determined by the first two moments of both estimators.

( ){ }]θ̂θ̂[Eθθ̂θ̂Eθ̂Eλ2])θ̂θ̂[(Eλ

)θ̂(MSE])θ̂θ̂(λ)θθ̂([E))λ(θ̂(MSE

)),λ(θ̂(MSE))λ(θ̂(MSE

j,etargtjjj,etargtj2

j2

j,etargtj2

j

2

j,etargtjjjj

p

1jj

−−−−−

+=−−−=

=∑=

Page 43: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

optimal shrinkage

no shrinkage full shrinkage

interval leading to MSE decrease

MSE = f(λ)

Page 44: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

Simulation • n samples • p genes • Xij ~ N(μj, 1) • μj ~ N(0, τ2) Investigate shrinkage effect under 3 different scenario’s: I : vary τ → p = 100, n = 40, τ = 0.1, 0.2, 0.4 II : vary n → p = 10*n, n = 10, 100, 200, τ = 0.1 III : vary p/n → p = 1000, n = 20, 50, 300, τ = 0.1

Page 45: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator

Simulation (continued) • Estimators: Now study MSE of JS-estimator in relation to .

Page 46: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator Simulation (continued): scenario I

Shrinkage yields more if genes are more “alike”.

Page 47: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator Simulation (continued): scenario II Shrinkage yields more if n is small.

Page 48: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator Simulation (continued): scenario III

Shrinkage yields more with larger p/n ratios

Page 49: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

James-Stein estimator Crucial question: how to determine λ in

Remember the simulation:

Simulation • n samples • p genes • Xij ~ N(μj, 1) • μj ~ N(0, τ2)

The latter can be regarded as a prior. We need to know this prior to estimate λ → empirical Bayes

Page 50: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes

Page 51: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes

The JS estimator can be motivated from an empirical Bayes perspective.

Empirical Bayes methods are Bayesian methods with a twist. In an empirical Bayes setting, the parameters at the top level of a hierarchical model are set to their optimal values (as determined from the data), instead of being integrated out.

Roughly: the priors are “estimated” rather than assumed.

Page 52: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes

JS example (continued)

Recall:

The are a sample from a prior distribution:

The Bayes estimator of the is their posterior mean, given the data:

Page 53: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes JS example (continued) The posterior mean is given by:*

Of the same form as the JS estimator...

* Standard Bayesian calculations

Presentator
Presentatienotities
Equivalence: multiply numerator and denominator by sigma^2 and sigma_0^2 = tau^2 Theta_j is the target and the mean of X’s is the non-shrunken estimator
Page 54: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

𝜃𝑗 + [1 − (𝑛𝜏 2+1)-1 ] (𝑋�𝑗 - 𝜃𝑗) = 𝑋�𝑗 − (𝑛𝜏 2+1)-1 (𝑋�𝑗 - 𝜃𝑗) *

Empirical Bayes Rewrite 𝜃𝑗 = 𝜃𝑡𝑡𝑡𝑡𝑡𝑡 and 𝜃� = 𝑋�𝑗

* is of the James-Stein form

with λ = (𝑛𝜏 2+1)-1

If n or τ is large, there is little shrinkage towards the target

Page 55: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes JS example (continued)

Remember:

Typically,use θj = θ. The prior mean θj plays role of target.

? How to estimate θ and τ ?

Page 56: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes Marginal likelihood Marginal likelihood: likelihood integrated w.r.t. all prior(s).

λd)λ(p)λ|(p)α;(p α∫= XX

Example

Xij ~iid N(μj, 1), μj ~iid N(θ,τ2), so α = {θ,τ}.

Parametric empirical Bayes: maximize p(X; α) w.r.t. parameters α.

∏∫∏=j

ji

jjij μd)τ,θ;μ(N)1,μ;x(N)τ,θ;(p X

Page 57: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes

•The integral reduces to a product Gaussian form (conjugacy)

∏∫∏∏ ==j

ji

jjijj

j μd)τ,θ;μ(N)1,μ;x(N)τ,θ;(p)τ,θ;(p XX

Example

Xij ~iid N(μj, 1), μj ~iid N(θ,τ2)

What is the unconditional density of Xj = (X1j,…, Xnj): p(Xj) = p(Xj; θ,τ)?

Page 58: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Bayesian inference, conjugate priors, example

𝑃 𝑿 = �𝑃 𝑿 𝜇 𝑃(𝜇)d𝜇

= 𝐶 � � exp ( 𝑋𝑖 − 𝜇 2)/2𝑛

𝑖=1

exp((𝜇−𝜃)2)/2𝜏2 d𝜇

= 𝐶� � exp ( 𝑋𝑖 − 𝐴 2)/𝐵𝑛

𝑖=1

exp((𝜇−𝐷)2)/𝐸 d𝜇

Where A, B do not depend on μ (but do depend on {θ,τ}). Gaussian form for μ: integral has to integrate to 1. First exponential also a Gaussian form: product of Gaussians

*

* dropping index j

Presentator
Presentatienotities
Apologizes for the change of notation i -> j, X_ij -> Y_j
Page 59: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Empirical Bayes

• p(Xj; θ,τ) reduces to a product of Gaussians • Outer product: Product of a product of Gaussians

• Hence, solving argmaxθ,τ p(X; θ,τ) reduces to max lik estimation, which is equivalent to moment estimation in a Gaussian setting:

• 2

j,iij

set2jijμjijμij

setjjijμij

)X-X(1-pn

11τ]}μ|X[V{E]}μ|X[E{V]X[V

Xθ}μ{E]}μ|X[E{E]X[E

jj

j

∑=+=+=

====

∏∫∏∏ ==j

ji

jjijj

j μd)τ,θ;μ(N)1,μ;x(N)τ,θ;(p)τ,θ;(p XX

Page 60: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Back to James-Stein estimator

Moreover, we derived:

We obtain an estimate of λ by substituting the empirical Bayes estimate:

λ = (𝑛𝜏 2+1)-1

1-)X-X(1-pn

1τ̂ 2

j,iij

2 ∑=

Page 61: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Beneficial effect of shrinkage

5 repeated studies. Estimates of parameter of interest +/- sd. Solid: no shrinkage; dashed: shrinkage. (a): n=5, (b): n=40.

Presentator
Presentatienotities
Continuous data, log-(ratio) scale
Page 62: Exploratory data analysismavdwiel/SHDD/PCAClustShrinkage.pdfExploratory analysis I: Hierarchical clustering 1 1 Slides: Wesselvan Wieringen Cluster analysis seeks • meaningful data-determined

Beneficial effects of shrinkage (more to come…)

• Better testing in a multiple testing setting

• Shrinkage causes bias…

• …but under selection pressure (e.g. pick the 5 genes with largest parameter): bias generally smaller than for unshrunken estimate

• In a regression setting shrinking a nuisance parameter can render higher power for the parameter of interest

• To be continued….

Presentator
Presentatienotities
Continuous data, log-(ratio) scale