detecting subtle differential expression in t cells of...

48
Detecting subtle differential expression in T cells of Melanoma patients using Microarray Experiments Susan Holmes Statistics Department, Stanford University Collaboration with Peter P Lee from Stanford’s Hematology Department Work supported by the ACS and the NSF. Berkeley, October 23rd, 2003 Thanks to Bioconductor contributors, in particular Sandrine Dudoit, Jean Yee et al. for marrayNorm and Wolfgang Huber for vsn.

Upload: others

Post on 15-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Detecting subtle differential expression in Tcells of Melanoma patients using

Microarray Experiments

Susan Holmes

Statistics Department, Stanford University

Collaboration with Peter P Lee from Stanford’s Hematology Department

Work supported by the ACS and the NSF.

Berkeley, October 23rd, 2003

Thanks to Bioconductor contributors, in particular Sandrine Dudoit, Jean

Yee et al. for marrayNorm and Wolfgang Huber for vsn.

Page 2: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

T-cell Data

70 agilent arrays, 13,000 genes in 5 batches. Much batch to batch effect,

no replication.

• five patients with melanoma, five healthy subjects.

• Stimulated and unstimulated T-cells of 7 different types.

Page 3: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

T-cell sorting

Page 4: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

1 10 100 1000 1

1 10 100 1000 10000

10

100

1000

1

10

100

1000

10000

effector naive

memoryCD45

RA

CD27

Page 5: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

The percentages of naıve, memory, and effector CD8+ subsets in each

healthy donor or melanoma patient sample was determined by CD27 and

CD45RA expression. There was no statistically significant differences

(p > 0.05) in the relative distribution of the subsets either amongst

subjects within each group or between melanoma and healthy subjects,

confirming that melanoma patients do not have gross perturbations of

their CD8+ T cell subsets. CD8+ T cell subsets were then isolated from

PBMC samples by FACSorting based on their expression of CD27 and

CD45RA, RNA was extracted, amplified, and hybridized onto Agilent

microarrays.

Page 6: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Two channels

Mean intensities.

Channel 1: Green, reference mRNA mixture of Human cell types.

Channel 2: Red, T-cells, of 7 different types after sorting and

amplication.

Page 7: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Red

Page 8: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Self-self and Dye Flip Experiments

Array system noise determination and dye swap experiment

Self-self hybridizations. 500 ng aRNA (from sorted CD8+ cells) labeled

with Cy3 or Cy5 were competitively hybridized against each other on

microarrays.

.

Page 9: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients
Page 10: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients
Page 11: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

The scatter plot shows that virtually all the features lie close to the

diagonal line corresponding to the expected log ratio of 0. Overall the

signal from the two channels is similar.

To test the difference of dye incorporation in probes, we performed dye

swap experiments. The figure shows that among 12842 features, 2773

display good correlation with similar differential expression for both dyes,

less than 1% features are differentially expressed, and about 97% of the

genes were correlated to each other.

Page 12: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients
Page 13: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Combined microarray data from two microarrays representing the two

halves of a dye swap experiment. In one experiment, the microarray was

hybridized with cyanine-3 labeled T cell cDNA and cyanine-5 labeled

reference cDNA generated from 500 ng of the corresponding cRNA. In

the second experiment, the microarray was hybridized with cyanine-5

labeled T cell cDNA and cyanine-3 labeled reference cDNA. Together,

the combined plot represents the two polarities in a dye swap

experiement. The yellow points are genes that displayed good correlation

with similar differential expression for both polarities of the dye swap

experiment. Shown in blue are genes that are unchanged and in red are

genes that were found to be differentially expressed in one polarity but

unchanged in the other polarity. The pink points represent anti-correlated

genes in the two polarities.

Page 14: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Data Normalization and Variance Stabilizing

• Variance Stabilizing Transformation (vsn) was used on the batches one

at a time.

• Batch effect was taken out by removing the batch medians.

Page 15: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Discriminant Analysis between healthy andmelanoma patients

Correction for cell effects allowed us to pool data from the three cell

subsets, to maximize the statistical power of our dataset. In order to

pool the cell types, they had to be homogenized, so cell type differences

between the CD8+ cells were removed through a simple linear model on

the transformed data.

We used linear discriminant analysis Then the Westfall and Young

multiple testing procedure was applied to detect the difference between

patients and healthy donors. A subset of 50 genes with the smallest

p-values was selected by taking the 50 features with smallest adjusted

p-values. These were used as the input to a discriminant analyses with

cross-validation that chose the best subset for discriminating the arrays

into healthy and melanoma groups with low classification error.

Page 16: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Eleven features (10 genes) were found to consistently discriminate

between T cells from melanoma patients versus healthy controls with

adjusted p-values < 0.05.

Page 17: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Discriminating Features, Ranked in order of SignificanceAbrev. UniGene Name/Description Adj. p-value Rel.Expres.

PSPC1 Paraspeckle component 1 0.0002 ⇓

DDB2 Damage-specific DNA binding protein 2 0.0160 ⇑

VIL2 Villin 2 (ezrin) 0.0278 ⇑

LGALS1 Lectin, galactoside-bind.1 (galectin 1) 0.0278 ⇑

ITPKB Inositol 1,4,5-trisphosphate 3-kinase B 0.0340 ⇑

LGALS1 Lectin, galactoside-bind.1 (galectin 1) 0.0340 ⇑

—- Hs, Simi.RNA binding motif prot., X 0.0492 ⇓

NMT2 N-myristoyltransferase 2 0.0492 ⇓

—- LOC154084; Hs clone mRNA 0.0492 ⇑

NME2 Non-metastatic cells 2, protein in 0.0492 ⇑

FLJ22059Hypothetical protein FLJ22059 0.0492 ⇑

Table 1: Table of Significantly differentially expressed genes The relative

expression in the last column gives the cancer patients’ expression relative

to healthy, ⇑ means overexpressed compared to healthy. The adjusted

p-values were computed using a Westfall and Young step down procedure

[?].

Page 18: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Hierarchical cluster and gene combinationhierarchical cluster

A hierarchical clustering of the data was performed using the mva

package in R. The results are displayed here for the 11 retained features.

Page 19: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients
Page 20: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

HE

A3

1_

EF

FE

_2

HE

A5

5_

ME

M_

4

HE

A5

5_

NA

I_4

HE

A5

9_

EF

FE

_5

HE

A2

6_

NA

I_1

HE

A2

5_

EF

FE

_3

HE

A3

1_

NA

I_2

HE

A2

6_

ME

M_

1

HE

A3

1_

ME

M_

2

HE

A2

6_

EF

FE

_1

HE

A2

5_

NA

I_3

HE

A5

5_

EF

FE

_4

HE

A2

5_

ME

M_

3

HE

A5

9_

ME

M_

5

HE

A5

9_

NA

I_5

ME

L5

3_

EF

FE

_3

ME

L3

9_

ME

M_

2

ME

L3

9_

NA

I_2

ME

L6

7_

EF

FE

_4

ME

L3

6_

NA

I_1

ME

L5

3_

NA

I_3

ME

L6

7_

NA

I_4

ME

L5

1_

EF

FE

_5

ME

L3

6_

EF

FE

_1

ME

L5

1_

ME

M_

5

ME

L3

6_

ME

M_

1

ME

L5

1_

NA

I_5

ME

L3

9_

EF

FE

_2

ME

L5

3_

NA

I_3

ME

L6

7_

ME

M_

4

Paraspeckle component 1

H. similar to RNA bind

N−myristoyltransferase 2

Galectin 1

Galectin 1

FLJ22059

inositol 1,4,5−triph.

H.clone 24889

DDB2

protein (NM23B) non−metastatic.

Villin 2

Page 21: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients
Page 22: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

HE

A2

5_

EF

FE

_3

HE

A5

9_

NA

I_5

HE

A2

6_

ME

M_

1

HE

A3

1_

NA

I_2

HE

A3

1_

ME

M_

2

HE

A5

9_

EF

FE

_5

HE

A2

5_

NA

I_3

ME

L6

7_

ME

M_

4

HE

A3

1_

EF

FE

_2

ME

L5

3_

NA

I_3

HE

A5

9_

ME

M_

5

ME

L3

9_

NA

I_2

ME

L3

6_

EF

FE

_1

ME

L6

7_

EF

FE

_4

ME

L5

1_

ME

M_

5

ME

L3

9_

EF

FE

_2

ME

L5

1_

NA

I_5

HE

A5

5_

ME

M_

4

HE

A2

6_

EF

FE

_1

HE

A5

5_

EF

FE

_4

HE

A2

5_

ME

M_

3

HE

A5

5_

NA

I_4

ME

L5

1_

EF

FE

_5

HE

A2

6_

NA

I_1

ME

L6

7_

NA

I_4

ME

L3

6_

ME

M_

1

ME

L5

3_

NA

I_3

ME

L3

6_

NA

I_1

ME

L3

9_

ME

M_

2

ME

L5

3_

EF

FE

_3

H.mRNA for CASH beta

caspase 3, apopt.

H.TFAR19 mRNA

programmed cell death

H.prostaglandin end.

caspase 7, apopt.

H.Lice2 beta cysteine

H.B−cell leuk./lymph.

caspase 10, apopt.

Page 23: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

The white are the most expressed, the red the least, with yellow as the

intermediary. Note that the clustering separates the two groups neatly

between melanoma and healthy patients. (b) Hierarchical Clustering of

the best 9 apoptosis genes, this clustering, even though done using the

best possible classifier is unable to decompose the two groups distinctly.

The same analysis was carried out on a set of 86 apoptosis genes, of

which the 9 most discriminant were chosen, these 9 genes were then

analysed using the same hierarchical clustering as the 10 overall best

genes.

The figure shows that this smaller subset of genes classifies healthy

donors and melanoma patients into two groups (only two outliers were

found in the cluster, and they are just borderline cases). In order to

check our procedure, we have used a cross validation method that leaves

the known label of each array out in turn and builds a predictor from the

remaining arrays. We can then compare to the known classification. This

cross validation estimate of prediction error gave a 100% well classified

Page 24: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

score, encouraging us to think that these 10 genes are indeed good

predictors of T cells from melanoma versus healthy subjects. The

expression for each gene in this subset is significantly different between

healthy donors and melanoma patients (p < 0.05). This data

demonstrates that CD8+ T cells (except CD45RA-CD27-) in melanoma

patients are subtly different from CD8+ T cells from healthy donors.

As the clustering done only on known apoptosis genes shows, the

distinction between melanoma and healthy patients cannot be made

simply on the evidence of genes involved in the classical apoptosis

pathway.

Page 25: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Check and recalibration

Male subjects were in a 2:1 ratio between the melanoma and healthy

patients in this study, creating an expected fold difference that was

observed on several specific Y-chromosome genes. When we performed

the multiple testing procedure with the Y-genes re-inserted into the

dataset, the SMC (mouse) homolog Y gene appears tied with galectin-1,

with an adjusted p-value of 0.0342. This puts the level of the differential

expression between the two groups of patients at around two-fold for the

galectin-1 gene.

Page 26: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Gene confirmation with real time RT-PCR

To confirm the significance of the genes identified to be differentially

expressed using microarrays, we performed quantitative real-time PCR

(RQ-PCR) analysis using the original, unamplified RNA materials. CD8

subsets were not combined in these experiments. Due to limited

quantities of unamplified RNA, six of ten genes were selected for

RQ-PCR analysis: DDB2, FLJ10955, Galectin-1, Myristoyltransferase,

Hypothetical Protein 669. A housekeeping gene, GAPDH, was also

amplified for normalization of data. The threshold cycle (CT ) for each

sample is calculated by iCycler Real Time Detection System. After

normalized with GAPDH, the CT of the six genes was analyzed using a

Wilcoxon test. Significant differences were found in the CD8+ T cells

between melanoma patients and healthy donors, matching the microarray

data.The only exception was expression of gene FLJ10955, which did not

show significant differences between healthy and melanoma in any of the

three subsets.

Page 27: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Results from two sample t-tests, by cell typesUniGene UniGene p-

value

p-

value

p-

value

Abbrev. Name/description Naive Effector Memory

PSPC1 Paraspeckle component 1 0.53 0.0498 0.54

DDB2 Damage-specific DNA binding protein

2

0.28 0.27 0.027

LGALS1 Lectin, galactoside-binding,1(galectin

1)

0.048 0.041 0.021

—- Hs, Similar to RNA binding motif

protein, X chromosome, clone, mRNA

0.08 6.10−6 0.69

NMT2 N-myristoyltransferase 2 0.27 0.0003 0.45

FLJ22059 Hypothetical protein FLJ22.. 0.75 0.62 0.21

Table 2: Significance of PCR validation of most differentially expressed

genes. The p-values are computed by using a Wilcoxon two sample test

comparing the melanoma patients and the healthy subjects.

Page 28: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Histogram of M3[, 12]

M3[, 12]

Fre

quen

cy

−6 −4 −2 0 2 4

010

020

030

040

050

060

070

0

Page 29: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Parametric Bootstrap

Generate data under various null hypotheses that have many features

similar to the orginal data:

• Same error distribution (form and moments).

• Same components.

• Vary some underlying tuning parameters.

Page 30: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

VSN Noise Model: Additive and MultiplicativeComponents

Yk,red = ared + vk,red + βredγkeηk,red

ared and βred are the offset and the factor. γk is gene specific and the

other terms are the noise. additive and multiplicative, we make weak

assumptions of symmetry of noise around 0:∑k

vki = 0∑

k

ηk,red = 0∑

`

η`,green = 0

η`,green + η`,red = 0

Page 31: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Finding the right distribution for the residuals

Heavy tails, extreme case was from a different data set:

Page 32: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Histogram of M3[, 12]

M3[, 12]

Fre

quen

cy

−6 −4 −2 0 2 4

010

020

030

040

050

060

070

0

Page 33: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Non normal residuals

For this data, the recentering-rescaling was always better with mad and

median.

=⇒ Laplace knew already that the error distribution that has the location

parameter the median and the scale parameter the mad has density:

fY (y) =12φexp[−|y − θ|/φ], φ > 0

Page 34: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Laplace’s First Error Distribution

Useful representation:

Y =√X · Z, X ∼ Exp(1), Z ∼ N(0, 1)

A mixture of Normals whose scale parameters vary from an exponential.

The probe sequences are of different length and so have varying

hybridization precisions.

Page 35: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Histogram of log5

Log Ratio

Fre

quen

cy

−4 −2 0 2 4

020

4060

8010

0

Histogram of X2

Simulated Laplace

Fre

quen

cy

−4 −2 0 2 4

050

100

150

Page 36: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

●●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●

●●●●●●●

●●

−4 −2 0 2 4

−6

−4

−2

02

46

log5

X2

Page 37: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Assymetric Laplace Distribution

The marginals each have assymmetric Laplace distributions:

Yid= miX +X1/2σiiZi

where Z ∼ N (0, 1) and X ∼ Exp(1).

Page 38: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Assymetric Multivariate Laplace Distribution

Definition (Kotz et al):

A random vector Y in Rd has a multivariate asymmetric Laplace

distribution (AL) if its characteristic function is given by

ψ(t) =1

1 + 12t′Σt− im′t

, t ∈ Rd.

where m ∈ Rd , m 6= 0 and is a dd non-negative definite symmetric

matrix. This distribution is ALd(m,Σ).

Let Y ∼ ALd(m,Σ) and let Z ∼ Nd(0,Σ). Let X be an exponentially

distributed r.v. with mean 1, independent of X. Then, the following

representation is valid:

Yd= mX +X1/2Z.

Page 39: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

and we have

Cov(Y ) = Σ +mm′

Page 40: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Random Number Generation

An ALd(m,Σ) generator (Kotz et al., 2003)

1. Decompose Σ = CC ′ using Choleski decomposition.

2. Generate a standard exponential variate X.

3. Independently of X, generate multivariate normal Nd(0,Σ) variate N:

• Generate iid normal vectors in a matrix of dimensions n× p: W.

• N = C ×W .

4. Set Y = m ·X +√XN.

5. Return Y

Page 41: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Parametric Bootstrap

Very useful in situations where there are not enough replicates.

Pretend that the data come from a parametric family Fθ.

Replace in the generation of ‘new’ data, the unknown parameters by Fθ.

Page 42: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Application to Microarrays

Model, after renormalization, and eventual variance stabilization.

mjk = mkεk +√εkZkj, Zkj ∼ N (0,Σ)

Genes are k, k = 1....n. Estimate the parameters, covariance matrix,

medians, mads.

This model is added onto the variance stabilization model.

One may want to change the covariance matrix to adjust for the

hypotheses being tested.

Plug these into the Assymetric Multivariate Laplace generator.

We can compare the number of genes chosen with the parametric model

with a given covariance structure as the one we get with the data.

Page 43: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Calibration through Simulation Experiments

If we assume that the measured signal ykj increases, to sufficient

approximation, proportionally to the mRNA abundance ckj of gene k on

the j-th array, or on the j-th color channel:

ykj ≈ aj + bjbkckj +mkj. (1)

Page 44: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Simulation of Data with same parameters

For red channel, and k = 1...600 consistently expressed genes

rk = ak,green + bk,green × sinh(arcsinh(lk) + εk,green,j)

1/lk ∼ Γ(1, 1), ε ∼ N (0, c2)

All the parameters were estimated from the data c2 is estimated from

Var(h(Y )−mean(µkj)).

For consistently expressed genes:

> lks <- 1/rgamma(600,1,1)

> simuldata <- matrix(0,600,96)

> eps <- matrix(rnorm(600*96,0,0.25),600,96)

> areds <- matrix(rnorm(600*48,-0.5,0.26),600,48)

> agreens <- matrix(rnorm(600*48,-1.66,0.52),600,48)

Page 45: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

> breds <- matrix(rnorm(600*48,0.006,0.0038),600,48)

> bgreens <- matrix(rnorm(600*48,0.004,0.0022),600,48)

> simuldata[,refs] <- agreens+bgreens*sinh(asinh(lks)+eps[,refs])

> simuldata[,samps] <- areds+breds*sinh(asinh(mks)+eps[,samps])

For differentially expressed genes (of which we believe there are about

20), we use an additional Bernouilli random variable (at the strain level

and an extra uniform random variable to say how much the new

expressions differ).

Page 46: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

Conclusions

Beyond the bootstrap:Distributional Assumptions can be useful.

• Frechet bounds for the correlations of genes under these distributional

assumptions can be found.

• The correlation coefficient may have a much narrower range than we

believe.

• We can attack the Multiple Testing Problem in a Multivariate

Framework.

• We will probably need a different measure of association.

Page 47: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

References

[1] S. Dudoit, Y. H. Yang, T. P. Speed, and M. J. Callow. Statistical

methods for identifying differentially expressed genes in replicated

cDNA microarray experiments. Statistica Sinica, 12:111–139, 2002.

[2] W. Huber, A. von Heydebreck, H. Sultmann, A. Poustka, and M.

Vingron. Variance stablization applied to microarray data calibration

and to quantification of differential expression. Bioinformatics,

18:S96–S104, 2002.

[3] W. Huber, A. von Heydebreck, and M. Vingron.

Analysis of microarray gene expression data. Manuscript,

http://www.dkfz.de/abt0840/whuber, 2002.

[4] Kotz, S. Kozubowski, T., J. and Podgorski, K. An assymetric

multivariate Laplace Distribution, May, 2003.

Page 48: Detecting subtle differential expression in T cells of ...statweb.stanford.edu/~susan/talks/berkbiostat.pdf · Detecting subtle differential expression in T cells of Melanoma patients

[5] David M. Rocke and Blythe Durbin. A model for measurement

error for gene expression analysis. Journal of Computational Biology,

8:557–569, 2001.

[6] Tong Xu, Chen-Tsen Shu, Elizabeth Purdom, Demi Dang, Diane

Ilsley, Yaqian Guo, Susan P. Holmes and Peter P. Lee Subtle but

consistent differences in gene expression of CD8+ T cells in melanoma

patients and healthy donors submitted.