epigenetic analysis bios 691- 803 statistics for systems biology spring 2008

31
Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Upload: jerome-harrison

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Epigenetic Analysis

BIOS 691- 803

Statistics for Systems Biology

Spring 2008

Page 2: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Kinds of Questions

• Where are the epigenetic modifications?

• How do they co-vary?

• How do epigenetic changes affect expression of genes?

Page 3: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Covariation of Epigenetic Measures

• Motivating questions– How are epigenetic modifications related?– What are the major determinants of epigenetic

state?

• Statistical techniques– Covariance calculation– Principal component analysis– Linear models

Page 4: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Location and Covariance

• Question: do epigenetic modifiers act on specific targets or do they act on whole regions of DNA?

• Direct experimental evidence contradictory

• Statistics may help:– Covariation patterns may be evidence

Page 5: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

CalcA in NCI60

• Calcitonin A gene• Two CpG clusters

plus 3 odd CpG’s• High correlation

within clusters

Page 6: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

CDH1 in NCI 60

Page 7: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Covariation in Methylation of 7 Genes

• Individual genes have multiple CpG sites

• Most variation: overall methylation

Epigenomic Analysis

Correlation Map of 108CpG sites in 6 genes across5 ECOG pilot samplesRed = 1White = 0Blue < 0

Page 8: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Methylation and Expression

• Single gene (E-cadherin) results suggest overall methylation correlated with expression

Page 9: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Methylation and Expression

• HELP assay gives genome-wide sampling of methylation sites at 15K genes

• If select genes with S/N > 2 in both measures, then correlations with associated genes are bi-modal

Epigenomic Analysis

Page 10: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

What Causes Methylation?

• NCI-60 derived from various tissues• Tissue characteristic profile + specific

history of cells• Fit linear model to each methylation site

– 9 tissues for 60 observations • 51 error df

• Overall 41% of variance attributable to tissue

• What causes the remainder of methylation differences?

Page 11: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

pp

Va

ria

nce

s

0.0

0.5

1.0

1.5

PCA for Cell-specific Factors

• Residual variance has one strong PC

• Remainder are ‘noise’

• 1st PC is almost constant– Reflects overall level of methylation– Is this an artifact or is it real?– Significantly correlated with expression of

DNMT1 & DNMT3A

Page 12: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Relations Between Epigenetic Measures - III

Stem Cells & Cancer

Page 13: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Issue: Cancer Stem Cells?

• Hypothesis: cancers arise from stem cells rather than differentiated epithelial cells

• How would you tell the difference between partially differentiated stem cells and de-differentiated epithelial cells?

• Proposal: compare characteristic epigenetic modifications of stem cells with cancers

• Epigenetic modifications are distinct– PRC2 (stem cells) vs methylation (cancer)

Page 14: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Statistical Methodology

• Test of association 2 x 2 table

• Fisher Exact p ~ 10-5

PRC2 not

Methylated 34 43

Not 3 97

Page 15: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Statistical Methodology

• Test of association 2 x 2 table

• Fisher Exact p ~ 10-5

• Alternatives– T-test (predictor: PRC2)– Linear model (predictor: methylation: T – N )

PRC2 not

Methylated 34 43

Not 3 97

Page 16: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

PRC2 – Methylation Association

Page 17: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Are CIMP’s Stem Cell Clones?

• Distinctive PRC2 sites appear preferentially methylated in CIMP tumors

Page 18: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Correlations between epigenetic and expression measures – I

Copy Number and Expression

Page 19: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Copy Number and Expression

• Large sections of DNA containing many genes are often copied or deleted

• We think most control elements are copied or deleted also

• If more (or fewer) copies of a gene then ceteris paribus there should be more (fewer) copies of RNA

Page 20: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Integrative Studies of CGH & Gene Expression

• Expect to see strong correlation between copy number and expression in data

• Previous studies report report weak effects– Average correlations from (0.04 to 0.27)

• NCI 60 study average correlation 0.16

Page 21: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Why Not?

• H1: there really isn’t much effect – biology – Somehow the cells are compensating– In any case there shouldn’t be any effect on

non-expressed genes

• H2: we may not be able to measure the effect that is there – technical error – Probes may be insensitive/cross-hybridizing– Signal/noise too low even when probes are

sensitive

Page 22: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Eliminating Uninformative Genes

• Genes which are silenced will not show effect of copy number variation– Mean signal a rough proxy– Remove genes with mean signal above 6.3

• Only genes with significant copy number variation (above measurement noise) will show effect– Select genes with SD of copy number > 0.5

Page 23: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Correlations of Selected Measures

Black: All correlationsRed: Reliably measured correlations

Page 24: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Estimating True Correlations

• If measurement noise of SD ~ 0.3 degrades expression measures, then true correlations of variables will be mostly closer to 0 than correlations of measures

• Given a correlation and measured standard deviations, what are most likely true standard deviations and true correlation?

Page 25: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

MLE of Noisy Correlations

• Noise can be estimated from replicates• If N large can estimate • SD of originals can be estimated by ML• Given s and e, the MLE of correlation can be

inferred

• For NCI 60 median MLE correlation ~ 0.65

Epigenomic Analysis

222

221 /1/1ˆ seser

Page 26: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Correlations between epigenetic and expression measures – II

Chromatin and Expression

Page 27: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Do Epigenetic Marks Regulate Transcription?

• Several studies finding only weak evidence by correlation analysis

• Same technical issue: S/N ratio

• Questions– Does methylation shut down most genes?– Which histone marks indicate active

transcription?

Page 28: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Methylation and Expression

• HELP assay gives genome-wide sampling of methylation sites at 15K genes

• Select genes with S/N > 2 in both measures

• Correlations with gene expression values are bi-modal

Epigenomic Analysis

Page 29: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Interpretation of Meth-Expr Corrs

• MLE of negative mode ~ -0.8

• ~ 2/3 of genes under that hump

• Unclear whether positive hump is real or an artifact of small sample size

• Possible explanations:– True induction by methylation

• Methylation of insulator

– Irrelevant CpG site

Page 30: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Acetylation and Expression

• Histones often acetylated during expression

• Histone 3 lysine 9 (H3K9) acetylation measured

• Measures corrupted by noise– Blue: S/N > 2.5– Red: S/N > 2– Black: S/N > 1.5

Page 31: Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008

Biological Prediction• H3K9 acetylation gene expression

• Is this real?– Experimental test: find genes with high

acetylation variance, and little expression variance by microarray

• Results (7 genes)

• Confirm hypothesis

• Implies:– Expression arrays are not sensitive

Epigenomic Analysis