epigenetic analysis bios 691- 803 statistics for systems biology spring 2008
TRANSCRIPT
Epigenetic Analysis
BIOS 691- 803
Statistics for Systems Biology
Spring 2008
Kinds of Questions
• Where are the epigenetic modifications?
• How do they co-vary?
• How do epigenetic changes affect expression of genes?
Covariation of Epigenetic Measures
• Motivating questions– How are epigenetic modifications related?– What are the major determinants of epigenetic
state?
• Statistical techniques– Covariance calculation– Principal component analysis– Linear models
Location and Covariance
• Question: do epigenetic modifiers act on specific targets or do they act on whole regions of DNA?
• Direct experimental evidence contradictory
• Statistics may help:– Covariation patterns may be evidence
CalcA in NCI60
• Calcitonin A gene• Two CpG clusters
plus 3 odd CpG’s• High correlation
within clusters
CDH1 in NCI 60
Covariation in Methylation of 7 Genes
• Individual genes have multiple CpG sites
• Most variation: overall methylation
Epigenomic Analysis
Correlation Map of 108CpG sites in 6 genes across5 ECOG pilot samplesRed = 1White = 0Blue < 0
Methylation and Expression
• Single gene (E-cadherin) results suggest overall methylation correlated with expression
Methylation and Expression
• HELP assay gives genome-wide sampling of methylation sites at 15K genes
• If select genes with S/N > 2 in both measures, then correlations with associated genes are bi-modal
Epigenomic Analysis
What Causes Methylation?
• NCI-60 derived from various tissues• Tissue characteristic profile + specific
history of cells• Fit linear model to each methylation site
– 9 tissues for 60 observations • 51 error df
• Overall 41% of variance attributable to tissue
• What causes the remainder of methylation differences?
pp
Va
ria
nce
s
0.0
0.5
1.0
1.5
PCA for Cell-specific Factors
• Residual variance has one strong PC
• Remainder are ‘noise’
• 1st PC is almost constant– Reflects overall level of methylation– Is this an artifact or is it real?– Significantly correlated with expression of
DNMT1 & DNMT3A
Relations Between Epigenetic Measures - III
Stem Cells & Cancer
Issue: Cancer Stem Cells?
• Hypothesis: cancers arise from stem cells rather than differentiated epithelial cells
• How would you tell the difference between partially differentiated stem cells and de-differentiated epithelial cells?
• Proposal: compare characteristic epigenetic modifications of stem cells with cancers
• Epigenetic modifications are distinct– PRC2 (stem cells) vs methylation (cancer)
Statistical Methodology
• Test of association 2 x 2 table
• Fisher Exact p ~ 10-5
PRC2 not
Methylated 34 43
Not 3 97
Statistical Methodology
• Test of association 2 x 2 table
• Fisher Exact p ~ 10-5
• Alternatives– T-test (predictor: PRC2)– Linear model (predictor: methylation: T – N )
PRC2 not
Methylated 34 43
Not 3 97
PRC2 – Methylation Association
Are CIMP’s Stem Cell Clones?
• Distinctive PRC2 sites appear preferentially methylated in CIMP tumors
Correlations between epigenetic and expression measures – I
Copy Number and Expression
Copy Number and Expression
• Large sections of DNA containing many genes are often copied or deleted
• We think most control elements are copied or deleted also
• If more (or fewer) copies of a gene then ceteris paribus there should be more (fewer) copies of RNA
Integrative Studies of CGH & Gene Expression
• Expect to see strong correlation between copy number and expression in data
• Previous studies report report weak effects– Average correlations from (0.04 to 0.27)
• NCI 60 study average correlation 0.16
Why Not?
• H1: there really isn’t much effect – biology – Somehow the cells are compensating– In any case there shouldn’t be any effect on
non-expressed genes
• H2: we may not be able to measure the effect that is there – technical error – Probes may be insensitive/cross-hybridizing– Signal/noise too low even when probes are
sensitive
Eliminating Uninformative Genes
• Genes which are silenced will not show effect of copy number variation– Mean signal a rough proxy– Remove genes with mean signal above 6.3
• Only genes with significant copy number variation (above measurement noise) will show effect– Select genes with SD of copy number > 0.5
Correlations of Selected Measures
Black: All correlationsRed: Reliably measured correlations
Estimating True Correlations
• If measurement noise of SD ~ 0.3 degrades expression measures, then true correlations of variables will be mostly closer to 0 than correlations of measures
• Given a correlation and measured standard deviations, what are most likely true standard deviations and true correlation?
MLE of Noisy Correlations
• Noise can be estimated from replicates• If N large can estimate • SD of originals can be estimated by ML• Given s and e, the MLE of correlation can be
inferred
• For NCI 60 median MLE correlation ~ 0.65
Epigenomic Analysis
222
221 /1/1ˆ seser
Correlations between epigenetic and expression measures – II
Chromatin and Expression
Do Epigenetic Marks Regulate Transcription?
• Several studies finding only weak evidence by correlation analysis
• Same technical issue: S/N ratio
• Questions– Does methylation shut down most genes?– Which histone marks indicate active
transcription?
Methylation and Expression
• HELP assay gives genome-wide sampling of methylation sites at 15K genes
• Select genes with S/N > 2 in both measures
• Correlations with gene expression values are bi-modal
Epigenomic Analysis
Interpretation of Meth-Expr Corrs
• MLE of negative mode ~ -0.8
• ~ 2/3 of genes under that hump
• Unclear whether positive hump is real or an artifact of small sample size
• Possible explanations:– True induction by methylation
• Methylation of insulator
– Irrelevant CpG site
Acetylation and Expression
• Histones often acetylated during expression
• Histone 3 lysine 9 (H3K9) acetylation measured
• Measures corrupted by noise– Blue: S/N > 2.5– Red: S/N > 2– Black: S/N > 1.5
Biological Prediction• H3K9 acetylation gene expression
• Is this real?– Experimental test: find genes with high
acetylation variance, and little expression variance by microarray
• Results (7 genes)
• Confirm hypothesis
• Implies:– Expression arrays are not sensitive
Epigenomic Analysis