lecture 20 enhancer-promoter interaction predictioncs.ucf.edu/~xiaoman/spring/lecture 21...
TRANSCRIPT
Enhancer-Promoter Interaction Prediction
Transcriptional Regulation
TF-DNAbinding
TF interactions
Chromatin structure
Posttranslational modification
Promoter
Enhancers – key mediators of context-specific gene regulation
Enhancer
Shh1 Mb
A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly, Lettice et al. HMG 2003
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3292289/
Enhancers interact with enhancers and promoters
Correlated “activity”
Tissues
Enhancer Network
∼100K P300-bound regions as candidate enhancer
correlated activity across 72 cell types based on DNAse hypersensitivity
widespread correlated activity between enhancers, which decreases with increasing inter-enhancer genomic distance
correlated enhancers tend to share common transcription factor (TF) binding motifs and tend to be spatially proximal.
Enhancers form clusters
https://www.ncbi.nlm.nih.gov/pubmed/23700312
Enhancers form disjoint clusters
With Interacting enhancer-promoter pairs in nine cell lines and nine primary cell types, we observed that two enhancers are likely to regulate either the same genes or completely different genes.
Enhancer features from ChromHMM
Enhancer Gene Starts Gene ‐‐ Transcribed Region
DNA
Binarized chromatin marks. Called based on a poisson distribuMon
Most likely HiddenState 1 2
H3K4me3 H3K36me3 H3K36me3 H3K36me3 H3K36me3H3K4me1 H3K4me3 H3K4me1
H3K27ac H3K4me1
3 4 6 6 6 6 6 5 5 5
Unobserved
High Probability ChromatinMarks in State0.7
H3K4me1
0.8H3K4me1
0.8K27ac
200 base pair interval 1: 4
3: 6:
2: 0.9H3K4me3
0.9H3K4me3
0.8K4me1
Emission distribution is a product of independent Bernoulli random variables
5
All probabilities are learned from the data
0.9H3K36me3
Binarization leads to explicit modeling of mark combinations and interpretable parameters6
Ernst and Kellis, Nat Biotech 2010 ; Ernst and Kellis, Nature Methods 2012
Detailed features of ChromHMM enhancers9 marks 9 human cell types 81 Chromatin Mark Tracks
x
• Learned jointly across cell types(virtualconcatenation)• State definitions are common• State locationsare dynamic Ernst et al, Nature 201
1
Brad Bernstein ENCODE Group
H3K4me1
H3K4me2
H3K4me3
H3K27ac
H3K9ac
H3K27me3
H4K20me1
H3K36me3
CTCF
+WCE
+RNA
HUVEC Umbilical vein endothelial
NHEK Keratinocytes
GM12878 Lymphoblastoid
K562 Myelogenous leukemia
HepG2 Liver carcinoma
NHLF Normal human lung fibroblast
HMEC Mammary epithelial cell
HSMM Skeletal muscle myoblasts
H1 Embryonic
HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1
ChromHMMModels across Many Roadmap/ ENCODE Cell and Tissue Types
127 Cell/Tissue Types
98 Cell/Tissue Types
H3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3
H3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3 H3K27ac
127 Cell/Tissue TypesH3K4me1 H3K4me3 H3K27me3 H3K9me3 H3K36me3 H3K27ac
H3K9ac H4K20me1 H3K79me2 H3K4me2 H2A.ZDNase
How will you predict enhancer-promoter interactions?
Experimental Methods
Closest genes as enhancer targets
Widely used and first choice until now
FANTOM5 CAGE allows for direct comparison between transcriptional activity of the enhancer and of putative target gene TSSs across a diverse set of human cells. Based on pairwise expression correlation, nearly half (40%) of the inferred TSS-associated enhancers (Methods) were linked with the nearest TSS, and 64% of enhancers have at least one correlated TSS within 500 kilobases.
Correlated activity between enhancers and promoters
H3K4me1 RNA polymerase II
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4041622/
Dnase-seq Dnase-seq
https://academic.oup.com/nar/article/41/14/6828/1074856
What else can you think to work here?
A more sophisticated approach
Features used: distance between an enhancer and a gene, conserved synteny, distance between P300 and the potential target genes in PPI network, GO similarity between P300 and potential target genes
Method: random forests
PETModule: A more practical approach
Features used: distance between an enhancer and a gene, conserved synteny, correlation of DNase-seq across tirs 1and 2 encode cell lines, functional similarity between motifs in an enhancer and those in a promoter
Method: random forests
https://www.nature.com/articles/srep30043
Functional similarity in PETModule
The IM-PET method
https://www.pnas.org/content/111/21/E2191
Other approachesTargetFinder, Ripple, Prestige, …
Features usedFeatures used: distance between an enhancer and a gene, conserved synteny, correlation, functional similarity, protein-protein interaction, different histone modification marks, CTCF, RAD21, …
H4K20me1, H3K79me2, H3K36me3, H3K27me3, H3K9ac, H3K4me3, H3K4me2, and H3K27ac), CTCF, Pol2, Rad21, SMC3, and DNase-seq data
Features of EPIPGM12878 HELA HMEC HUVEC IMR90 K562 KBM7 NHEK
CTCF GSM733752 GSM733785 GSM733724 GSM733716 GSM935404 GSM733719 - GSM733636
Dnase-Seq GSM816665 GSM816633 GSM816669 GSM816646 GSM816665 GSM816655 - GSM816635
H3K27ac GSM733771 GSM733684 GSM733660 GSM733683 GSM469966 GSM733656 - GSM733674
H3K27me3 GSM733758 GSM733696 GSM733722 GSM733688 GSM469968 GSM733658 - GSM733701
H3K36me3 GSM733679 GSM733711 GSM733707 GSM733757 GSM521890 GSM733714 - GSM733726
H3K4me1 GSM733769 GSM798322 GSM733654 GSM733683 GSM521895 GSM733651 - GSM733686
H3K4me2 GSM733769 GSM733734 GSM733654 GSM733683 GSM521899 GSM733651 - GSM733686
H3K4Me3 GSM733708 GSM733682 GSM733712 GSM733673 GSM469970 GSM733680 - GSM733720
H3K79me2 GSM733736 GSM733669 GSM1003557 GSM1003555 GSM521909 H3K79me2 - GSM1003527
H3K9ac GSM733677 GSM733756 GSM733713 GSM733735 GSM469973 GSM733778 - GSM733665
H4K20me1 GSM733642 GSM733689 GSM733647 GSM733640 GSM521915 GSM733675 - GSM733728
Pol2 GSM803355 GSM733759 - GSM733749 GSM935513 GSM733643 - GSM733671
Rad21 GSM935332 - - - GSM935624 GSM803447 - -
SMC3 GSM935376 - - - GSM2422871 GSM935310 - -
EPIP: the ensemble procedure
EPIP on external data
How will you predict enhancer-promoter interactions?