integrative fly analysis: specific aims aim 1: comprehensive data collection – data qc / data...
TRANSCRIPT
1
Integrative fly analysis: specific aims
• Aim 1: Comprehensive data collection– Data QC / data standards / – consistent pipelines
• Aim 2: Integrative annotation– Systematically annotate functional
elements based on combined experimental information
• Aim 3: Clusters of activity– Find genes / enhancers / chromatin
regions / domains of coordinated activity across conditions
• Aim 4: Predictive models of gene expression– How do motifs -> binding ->
chromatin -> expr/splicing, where ‘->’ = ‘predicts’
• Aim 5: Regulatory and functional networks– Regulatory network inference– Functional network validation
• Aim 6: Comparative / evolutionary analysis– Using conservation to assess:
Function / coverage
1. Supervised learning for enhancer annotation
2
• Logistic regression classifier recovers known CRMs• Combinations of features in each class outperform
individual members of that class• Combinations of features across classes even stronger
2. Functions of 20 distinct chromatin states in fly
DV enhancers AP enhancers General TFs Insulators Replication Motifs
Chromatin marks
3. Clusters of activity (e.g. CBP binding vs. TFs)
• Confirmed by distinct enrichments for– Chromatin mark combinations– Regulatory motifs– GO functional categories– Developmental anatomical terms
Component parameters
Trx Trx PolycombEarly regulators
(kr, cad, hb)
5
1.3 0.7 1.1 1.3 0.8 0.6 1.5 1.5 2.4 0.6 0.9 0.1 0.3 0.2 0.1 1.3 1.4 1.3 0.9 1.01.0 2.2 1.8 0.4 0.3 0.6 0.1 0.4 0.1 0.0 0.2 0.1 0.0 0.0 0.0 0.0 0.3 0.3 5.4 0.30.7 2.6 0.8 0.2 0.1 0.3 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 6.4 0.10.8 0.9 0.9 0.8 1.8 1.1 0.7 0.4 0.4 0.2 1.1 0.1 0.1 0.0 0.0 0.5 1.5 2.2 1.4 1.10.4 15.5 0.9 0.2 1.1 1.4 0.1 0.0 0.0 0.0 0.1 0.3 0.1 0.0 0.0 0.1 0.1 0.6 1.8 0.21.2 2.0 3.0 3.6 8.2 7.9 2.3 0.5 0.5 0.6 0.6 0.7 3.2 0.5 0.1 1.2 0.4 0.1 0.2 0.10.8 3.8 3.5 2.6 5.0 8.9 1.9 0.3 0.2 0.3 1.4 3.8 5.2 0.5 0.1 0.6 0.3 0.1 0.1 0.12.0 2.9 2.7 3.6 2.7 2.4 1.4 0.7 0.7 0.2 1.4 0.3 0.6 0.2 0.1 1.0 0.9 0.7 0.8 0.62.0 2.9 2.9 3.3 4.3 5.2 1.0 0.5 0.1 0.3 1.7 2.8 2.9 0.2 0.1 0.6 0.7 0.5 0.5 0.41.9 1.3 2.0 1.7 1.0 0.7 0.3 0.7 0.7 0.1 0.2 0.0 0.2 0.1 0.0 0.3 0.6 0.5 3.1 0.62.0 1.2 2.5 2.6 2.7 1.6 0.6 0.6 0.5 0.3 0.7 0.1 0.5 0.1 0.1 0.6 0.6 0.8 1.5 0.82.0 1.0 1.8 1.7 1.1 0.8 0.5 1.0 0.8 0.1 0.8 0.0 0.2 0.0 0.0 0.5 0.9 1.0 1.1 1.10.5 0.9 0.6 0.7 5.0 2.7 1.9 0.7 0.6 4.6 0.9 3.4 6.1 5.0 4.0 1.5 0.7 0.5 0.1 0.10.0 1.0 0.5 0.4 1.6 1.8 0.7 0.3 0.0 1.4 1.7 13.6 14.4 1.8 2.7 0.6 0.2 0.1 0.0 0.10.2 1.3 0.7 0.8 3.7 4.0 1.2 0.2 0.1 1.6 0.7 7.3 14.5 2.9 2.3 1.3 0.3 0.1 0.0 0.00.2 0.9 1.0 0.9 5.0 6.5 2.2 0.4 0.6 2.6 0.7 3.0 10.3 3.5 1.7 2.0 0.5 0.4 0.0 0.10.2 0.4 0.2 0.1 0.5 0.6 3.0 1.2 3.7 12.3 0.5 1.8 2.5 6.3 5.8 3.5 0.8 0.7 0.0 0.10.7 0.8 0.9 0.9 1.7 1.7 3.1 1.6 3.6 4.8 1.5 1.0 1.5 2.0 1.2 4.2 1.6 1.3 0.3 0.40.2 1.0 0.8 0.1 0.3 0.5 1.8 1.8 1.3 5.2 1.9 2.8 1.5 5.4 4.5 2.7 1.2 0.7 0.4 0.40.1 0.7 0.2 0.1 0.1 0.2 0.7 1.2 0.2 4.4 3.6 9.2 2.0 6.7 9.6 1.6 0.6 0.3 0.0 0.20.0 0.2 0.1 0.1 0.4 0.2 0.8 0.5 0.3 6.2 0.6 3.2 3.7 11.0 11.7 1.8 0.5 0.5 0.0 0.00.0 0.1 0.0 0.0 0.1 0.1 0.2 0.3 0.0 3.1 0.8 8.1 4.6 11.6 12.2 0.6 0.1 0.2 0.0 0.00.0 0.2 0.0 0.0 0.1 0.1 0.2 0.2 0.0 2.6 1.4 15.1 6.5 6.3 10.3 0.4 0.2 0.1 0.0 0.00.1 0.8 0.1 0.2 0.3 0.3 0.4 0.6 0.0 1.1 3.6 18.2 8.1 2.5 6.2 0.5 0.1 0.1 0.0 0.10.2 1.8 0.6 0.3 0.5 1.1 0.7 1.2 0.1 2.5 5.3 8.6 3.1 2.7 3.8 0.8 0.8 0.3 0.2 0.60.3 1.2 0.3 0.2 0.4 0.9 1.0 1.1 0.1 2.7 3.4 8.5 4.4 5.6 7.2 0.9 0.6 0.3 0.1 0.31.1 1.6 1.1 0.8 1.0 1.3 1.3 1.1 0.6 1.1 4.8 1.4 0.6 0.7 0.7 2.1 1.7 1.2 0.5 1.00.8 2.2 1.2 0.6 0.8 1.6 1.6 1.8 0.3 0.9 2.3 1.5 1.3 1.1 0.7 0.5 0.9 0.5 0.5 1.11.4 1.5 1.3 1.8 1.2 1.3 0.3 0.9 0.5 0.1 0.7 0.1 0.1 0.1 0.0 0.8 1.4 1.0 1.1 1.10.9 4.1 1.3 2.1 1.2 1.1 0.3 0.2 0.0 0.0 0.5 0.5 0.6 0.1 0.0 0.3 0.3 0.4 3.5 0.51.1 1.3 0.8 1.1 0.6 0.8 0.8 0.9 0.9 0.2 0.7 0.2 0.2 0.2 0.0 0.8 1.4 1.1 1.4 1.10.8 1.2 0.5 0.5 0.4 0.2 0.6 0.9 0.1 0.0 0.5 0.2 0.0 0.0 0.0 0.7 1.7 1.0 0.8 1.50.8 2.9 1.6 0.4 0.6 0.9 1.1 1.3 0.2 0.0 2.2 0.9 0.4 0.0 0.1 0.6 1.8 0.8 0.3 1.41.5 1.0 1.4 1.8 0.9 0.8 1.1 1.1 0.8 0.3 1.2 0.1 0.3 0.1 0.1 1.0 1.3 1.5 0.7 1.10.8 3.0 1.4 0.4 0.8 2.2 1.7 1.1 0.5 1.7 3.1 5.8 2.9 1.5 2.5 2.0 2.3 0.7 0.1 0.51.4 1.3 0.9 0.9 0.3 0.7 0.7 1.5 0.3 0.0 1.0 0.2 0.2 0.0 0.0 0.5 1.2 0.7 0.6 1.51.7 2.9 2.1 2.0 1.1 1.7 0.6 0.7 0.2 0.3 0.6 0.5 0.5 0.1 0.1 0.8 1.0 0.5 1.7 0.81.7 0.4 0.5 0.8 0.2 0.1 0.3 1.1 0.3 0.0 0.5 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.8 1.60.8 0.5 0.4 0.2 0.1 0.1 0.3 0.9 0.2 0.0 0.4 0.0 0.0 0.0 0.0 0.2 0.7 0.6 1.3 1.71.0 0.6 0.9 0.9 0.5 0.5 0.9 1.1 1.1 0.2 1.0 0.1 0.2 0.1 0.0 0.9 1.2 1.3 0.8 1.4
3. Clusters of TFs vs. chromatin states
Polycomb states enriched for enhancers
AP-state 60-fold enriched in enhancers
Ubiquitous genes enriched for multiple states
Trx in enhancer states
BEAF/Chro in TSSfor ubiquitous genes
Strong Su(Hw) in Negativeoutside promoter states
4. Motif combinations for TF binding prediction
6
• Many motifs enriched in binding of corresponding TF (diagonal)
• However, extensive cross-enrichment suggests extensive cross-talk across binding of factors
2-4 24
Fold enrichment
Moti
f enr
ichm
ent
Transcription factor binding
• Indeed, predictive power for binding increases with motif combinations
• Both synergistic and antagonistic effects
5. Data integration for stage-specific regulators
7
Fold enrichment or over expression
• abd-A motif is enriched in new H3K27me3 regions at L2– Coincides with a drop in the expression of abd-A– Model: sites gain H3K27me3 as abd-A binding lost
• Additional intriguing stories found, to be explored
H3K27me3
6. Evolutionary signatures for diverse functionsProtein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation
RNA structures - Compensatory changes - Silent G-U substitutions
microRNAs - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3’UTR motifs
Regulatory motifs - Mutations preserve consensus - Increased Branch Length
Score - Genome-wide conservationStark et al, Nature 2007; Clark et al, Nature 2007
Assessing fraction of conserved bases ‘explained’
Cumulative
Per element
+CNV
+CDS
+Pol2
+TF
+Marks+ORC
+3’UTR+new3’UTR
+newCDS
+new5’UTR
Fly
% o
f con
serv
ed b
ases
40%
80%
The challenge ahead
Ant
erio
r-P
oste
rior
Dor
sal-V
entr
al
Annotations & images for all expression patterns
Expression domain primitives reveal underlying logic
Binding sites of everydevelopmental regulator
GAF, check
Su(Hw), check
BEAF-32, variant
Mod(mdg4), novel
CP190, novel
CTCF, check
Sequence motifs forevery regulator
Understand regulatory logic specifying development
11
Fly AWG teamSue CelnikerBrenton GraveleySteve BrennerMichael Brent
Gary KarpenSarah ElginMitzi KurodaVince Pirrotta
Peter Park Peter KharchenkoMichael TolstorukovEric Bishop
Kevin WhiteCasey BrownNicolas NegreNick BildBob Grossman
Eric LaiNicolas Robine
David MacAlpineMatthew Eaton
Steve Henikoff
Peter BickelBen Brown
Lincoln Stein GroupSuzanna LewisGos MicklemNicole WashingtonEO StinsonMarc PerryPeter Ruzanov
AWG
Fly modEncode
MIT CompBio GroupChris BristowPouya KheradpourMike LinRachel SealfonRogerio Candeias
compbio.mit.edu