integrative fly analysis: specific aims aim 1: comprehensive data collection – data qc / data...

11
Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation – Systematically annotate functional elements based on combined experimental information Aim 3: Clusters of activity – Find genes / enhancers / chromatin regions / domains of coordinated activity across conditions 1 Aim 4: Predictive models of gene expression How do motifs -> binding -> chromatin -> expr/splicing, where ‘->’ = ‘predicts’ Aim 5: Regulatory and functional networks Regulatory network inference Functional network validation Aim 6: Comparative / evolutionary analysis

Upload: dennis-hardy

Post on 16-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

1

Integrative fly analysis: specific aims

• Aim 1: Comprehensive data collection– Data QC / data standards / – consistent pipelines

• Aim 2: Integrative annotation– Systematically annotate functional

elements based on combined experimental information

• Aim 3: Clusters of activity– Find genes / enhancers / chromatin

regions / domains of coordinated activity across conditions

• Aim 4: Predictive models of gene expression– How do motifs -> binding ->

chromatin -> expr/splicing, where ‘->’ = ‘predicts’

• Aim 5: Regulatory and functional networks– Regulatory network inference– Functional network validation

• Aim 6: Comparative / evolutionary analysis– Using conservation to assess:

Function / coverage

Page 2: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

1. Supervised learning for enhancer annotation

2

• Logistic regression classifier recovers known CRMs• Combinations of features in each class outperform

individual members of that class• Combinations of features across classes even stronger

Page 3: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

2. Functions of 20 distinct chromatin states in fly

DV enhancers AP enhancers General TFs Insulators Replication Motifs

Chromatin marks

Page 4: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

3. Clusters of activity (e.g. CBP binding vs. TFs)

• Confirmed by distinct enrichments for– Chromatin mark combinations– Regulatory motifs– GO functional categories– Developmental anatomical terms

Component parameters

Trx Trx PolycombEarly regulators

(kr, cad, hb)

Page 5: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

5

1.3 0.7 1.1 1.3 0.8 0.6 1.5 1.5 2.4 0.6 0.9 0.1 0.3 0.2 0.1 1.3 1.4 1.3 0.9 1.01.0 2.2 1.8 0.4 0.3 0.6 0.1 0.4 0.1 0.0 0.2 0.1 0.0 0.0 0.0 0.0 0.3 0.3 5.4 0.30.7 2.6 0.8 0.2 0.1 0.3 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 6.4 0.10.8 0.9 0.9 0.8 1.8 1.1 0.7 0.4 0.4 0.2 1.1 0.1 0.1 0.0 0.0 0.5 1.5 2.2 1.4 1.10.4 15.5 0.9 0.2 1.1 1.4 0.1 0.0 0.0 0.0 0.1 0.3 0.1 0.0 0.0 0.1 0.1 0.6 1.8 0.21.2 2.0 3.0 3.6 8.2 7.9 2.3 0.5 0.5 0.6 0.6 0.7 3.2 0.5 0.1 1.2 0.4 0.1 0.2 0.10.8 3.8 3.5 2.6 5.0 8.9 1.9 0.3 0.2 0.3 1.4 3.8 5.2 0.5 0.1 0.6 0.3 0.1 0.1 0.12.0 2.9 2.7 3.6 2.7 2.4 1.4 0.7 0.7 0.2 1.4 0.3 0.6 0.2 0.1 1.0 0.9 0.7 0.8 0.62.0 2.9 2.9 3.3 4.3 5.2 1.0 0.5 0.1 0.3 1.7 2.8 2.9 0.2 0.1 0.6 0.7 0.5 0.5 0.41.9 1.3 2.0 1.7 1.0 0.7 0.3 0.7 0.7 0.1 0.2 0.0 0.2 0.1 0.0 0.3 0.6 0.5 3.1 0.62.0 1.2 2.5 2.6 2.7 1.6 0.6 0.6 0.5 0.3 0.7 0.1 0.5 0.1 0.1 0.6 0.6 0.8 1.5 0.82.0 1.0 1.8 1.7 1.1 0.8 0.5 1.0 0.8 0.1 0.8 0.0 0.2 0.0 0.0 0.5 0.9 1.0 1.1 1.10.5 0.9 0.6 0.7 5.0 2.7 1.9 0.7 0.6 4.6 0.9 3.4 6.1 5.0 4.0 1.5 0.7 0.5 0.1 0.10.0 1.0 0.5 0.4 1.6 1.8 0.7 0.3 0.0 1.4 1.7 13.6 14.4 1.8 2.7 0.6 0.2 0.1 0.0 0.10.2 1.3 0.7 0.8 3.7 4.0 1.2 0.2 0.1 1.6 0.7 7.3 14.5 2.9 2.3 1.3 0.3 0.1 0.0 0.00.2 0.9 1.0 0.9 5.0 6.5 2.2 0.4 0.6 2.6 0.7 3.0 10.3 3.5 1.7 2.0 0.5 0.4 0.0 0.10.2 0.4 0.2 0.1 0.5 0.6 3.0 1.2 3.7 12.3 0.5 1.8 2.5 6.3 5.8 3.5 0.8 0.7 0.0 0.10.7 0.8 0.9 0.9 1.7 1.7 3.1 1.6 3.6 4.8 1.5 1.0 1.5 2.0 1.2 4.2 1.6 1.3 0.3 0.40.2 1.0 0.8 0.1 0.3 0.5 1.8 1.8 1.3 5.2 1.9 2.8 1.5 5.4 4.5 2.7 1.2 0.7 0.4 0.40.1 0.7 0.2 0.1 0.1 0.2 0.7 1.2 0.2 4.4 3.6 9.2 2.0 6.7 9.6 1.6 0.6 0.3 0.0 0.20.0 0.2 0.1 0.1 0.4 0.2 0.8 0.5 0.3 6.2 0.6 3.2 3.7 11.0 11.7 1.8 0.5 0.5 0.0 0.00.0 0.1 0.0 0.0 0.1 0.1 0.2 0.3 0.0 3.1 0.8 8.1 4.6 11.6 12.2 0.6 0.1 0.2 0.0 0.00.0 0.2 0.0 0.0 0.1 0.1 0.2 0.2 0.0 2.6 1.4 15.1 6.5 6.3 10.3 0.4 0.2 0.1 0.0 0.00.1 0.8 0.1 0.2 0.3 0.3 0.4 0.6 0.0 1.1 3.6 18.2 8.1 2.5 6.2 0.5 0.1 0.1 0.0 0.10.2 1.8 0.6 0.3 0.5 1.1 0.7 1.2 0.1 2.5 5.3 8.6 3.1 2.7 3.8 0.8 0.8 0.3 0.2 0.60.3 1.2 0.3 0.2 0.4 0.9 1.0 1.1 0.1 2.7 3.4 8.5 4.4 5.6 7.2 0.9 0.6 0.3 0.1 0.31.1 1.6 1.1 0.8 1.0 1.3 1.3 1.1 0.6 1.1 4.8 1.4 0.6 0.7 0.7 2.1 1.7 1.2 0.5 1.00.8 2.2 1.2 0.6 0.8 1.6 1.6 1.8 0.3 0.9 2.3 1.5 1.3 1.1 0.7 0.5 0.9 0.5 0.5 1.11.4 1.5 1.3 1.8 1.2 1.3 0.3 0.9 0.5 0.1 0.7 0.1 0.1 0.1 0.0 0.8 1.4 1.0 1.1 1.10.9 4.1 1.3 2.1 1.2 1.1 0.3 0.2 0.0 0.0 0.5 0.5 0.6 0.1 0.0 0.3 0.3 0.4 3.5 0.51.1 1.3 0.8 1.1 0.6 0.8 0.8 0.9 0.9 0.2 0.7 0.2 0.2 0.2 0.0 0.8 1.4 1.1 1.4 1.10.8 1.2 0.5 0.5 0.4 0.2 0.6 0.9 0.1 0.0 0.5 0.2 0.0 0.0 0.0 0.7 1.7 1.0 0.8 1.50.8 2.9 1.6 0.4 0.6 0.9 1.1 1.3 0.2 0.0 2.2 0.9 0.4 0.0 0.1 0.6 1.8 0.8 0.3 1.41.5 1.0 1.4 1.8 0.9 0.8 1.1 1.1 0.8 0.3 1.2 0.1 0.3 0.1 0.1 1.0 1.3 1.5 0.7 1.10.8 3.0 1.4 0.4 0.8 2.2 1.7 1.1 0.5 1.7 3.1 5.8 2.9 1.5 2.5 2.0 2.3 0.7 0.1 0.51.4 1.3 0.9 0.9 0.3 0.7 0.7 1.5 0.3 0.0 1.0 0.2 0.2 0.0 0.0 0.5 1.2 0.7 0.6 1.51.7 2.9 2.1 2.0 1.1 1.7 0.6 0.7 0.2 0.3 0.6 0.5 0.5 0.1 0.1 0.8 1.0 0.5 1.7 0.81.7 0.4 0.5 0.8 0.2 0.1 0.3 1.1 0.3 0.0 0.5 0.0 0.0 0.0 0.0 0.3 1.0 1.0 0.8 1.60.8 0.5 0.4 0.2 0.1 0.1 0.3 0.9 0.2 0.0 0.4 0.0 0.0 0.0 0.0 0.2 0.7 0.6 1.3 1.71.0 0.6 0.9 0.9 0.5 0.5 0.9 1.1 1.1 0.2 1.0 0.1 0.2 0.1 0.0 0.9 1.2 1.3 0.8 1.4

3. Clusters of TFs vs. chromatin states

Polycomb states enriched for enhancers

AP-state 60-fold enriched in enhancers

Ubiquitous genes enriched for multiple states

Trx in enhancer states

BEAF/Chro in TSSfor ubiquitous genes

Strong Su(Hw) in Negativeoutside promoter states

Page 6: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

4. Motif combinations for TF binding prediction

6

• Many motifs enriched in binding of corresponding TF (diagonal)

• However, extensive cross-enrichment suggests extensive cross-talk across binding of factors

2-4 24

Fold enrichment

Moti

f enr

ichm

ent

Transcription factor binding

• Indeed, predictive power for binding increases with motif combinations

• Both synergistic and antagonistic effects

Page 7: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

5. Data integration for stage-specific regulators

7

Fold enrichment or over expression

• abd-A motif is enriched in new H3K27me3 regions at L2– Coincides with a drop in the expression of abd-A– Model: sites gain H3K27me3 as abd-A binding lost

• Additional intriguing stories found, to be explored

H3K27me3

Page 8: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

6. Evolutionary signatures for diverse functionsProtein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation

RNA structures - Compensatory changes - Silent G-U substitutions

microRNAs - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3’UTR motifs

Regulatory motifs - Mutations preserve consensus - Increased Branch Length

Score - Genome-wide conservationStark et al, Nature 2007; Clark et al, Nature 2007

Page 9: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

Assessing fraction of conserved bases ‘explained’

Cumulative

Per element

+CNV

+CDS

+Pol2

+TF

+Marks+ORC

+3’UTR+new3’UTR

+newCDS

+new5’UTR

Fly

% o

f con

serv

ed b

ases

40%

80%

Page 10: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

The challenge ahead

Ant

erio

r-P

oste

rior

Dor

sal-V

entr

al

Annotations & images for all expression patterns

Expression domain primitives reveal underlying logic

Binding sites of everydevelopmental regulator

GAF, check

Su(Hw), check

BEAF-32, variant

Mod(mdg4), novel

CP190, novel

CTCF, check

Sequence motifs forevery regulator

Understand regulatory logic specifying development

Page 11: Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation

11

Fly AWG teamSue CelnikerBrenton GraveleySteve BrennerMichael Brent

Gary KarpenSarah ElginMitzi KurodaVince Pirrotta

Peter Park Peter KharchenkoMichael TolstorukovEric Bishop

Kevin WhiteCasey BrownNicolas NegreNick BildBob Grossman

Eric LaiNicolas Robine

David MacAlpineMatthew Eaton

Steve Henikoff

Peter BickelBen Brown

Lincoln Stein GroupSuzanna LewisGos MicklemNicole WashingtonEO StinsonMarc PerryPeter Ruzanov

AWG

Fly modEncode

MIT CompBio GroupChris BristowPouya KheradpourMike LinRachel SealfonRogerio Candeias

compbio.mit.edu