deciphering the regulatory code in the genome
DESCRIPTION
There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.TRANSCRIPT
![Page 1: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/1.jpg)
Deciphering the regulatory code in the genome
PhD completion seminar Denis C. Bauer
Institute for Molecular Bioscience The University of Queensland, Australia
by linh.ngân By yankodesign
![Page 2: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/2.jpg)
Research Aim
Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed.
AAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAGAAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAG
Thermodynamic model
Express gene with 70% capacity when it
is hot, Thanks!
![Page 3: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/3.jpg)
Why understanding transcriptional regulation is important?
• Insight in the biology of gene pathways.
• Search for regulatory regions with specific function.
• “Re-programming” of genes has therapeutic potential.
transcription
gene promoter
A
Design and insert a new regulatory element
Broken regulatory element
DNA
![Page 4: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/4.jpg)
What do we need to know for building a model able to translate the regulatory message ?
![Page 5: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/5.jpg)
Background : Enhancer
• Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions.
enhancer regions
transcription
gene promoter
![Page 6: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/6.jpg)
• Transcription is regulated by the binding of activator and repressor TFs to an enhancer region.
Background: Enhancer
enhancer
transcription 8 Activators
2 Repressors
binding site map
TF Concentration
Active
![Page 7: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/7.jpg)
Background: Repression
• Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other.
enhancer
binding site map
Repressor range
![Page 8: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/8.jpg)
On which system would we test the model’s abiliJes ?
![Page 9: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/9.jpg)
1 hLp://insects.eugenes.org/ 2 Small et al. 3 hLp://bioinform.geneJka.ru
Drosophila melanogaster 1
Embryo stained for eve 2
Function representation 3
Background: Even-skipped gene (eve)
![Page 10: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/10.jpg)
Background: Regulation of eve
lacZ
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
Late1 3+7 2 P late2 4+6 1 5
MSE MSE MSE MSE MSE eve
![Page 11: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/11.jpg)
Genome
architecture,
RNA,
methylaJon,
…
Hypothesis
Binding site map
TF
concentraJons
predicts gene activation
![Page 12: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/12.jpg)
Research Goals
• Optimize Thermodynamic models efficiently.
• Analyze robustness of these models.
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene.
• Examine how the regulatory program evolves.
• Extend current thermodynamic model.
![Page 13: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/13.jpg)
Model definition
Buena Vista Pictures
p(s, t) =!Kt · K(s, t) · [t]
1 + !Kt · K(s, t) · [t] Free parameters
TF PARAMS
Binding affinity
Effectiveness
GENERAL PARAMS
Max. transcription rate
Energy barrier
!K
!E
!R0
!G0
s s!
Site occupancy (Hill function)
Total activation
Transcription rate (Arrhenius function)
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
W (S, T ) =!
s!SA
"#Etsp(s, ts)
$
s!!SR
1!%
#Ets! · p(s", ts!) · d(s, s")& '
( )* +quenching of the activator
( )* +activator contribution
R(S, T ) =
!"
#$R0 exp
%W (S, T )! $G0
&i! W < $G0
$R0 otherwise,
ts!ts
![Page 14: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/14.jpg)
40 50 60 70 80 90
050
100
150
40 50 60 70 80 90
050
100
150
Training the model
predicted expression and
compare it to target
TF Concentration TF Binding
[TF1], [TF2], [TF3], [TF4] < >
Thermodynamic Model
0 20 40 60 80 100
050
100
200
Adjust model parameters to improve fit
![Page 15: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/15.jpg)
Optimization methods
• Two optimization paradigms – Simulated Annealing
• LAM schedule (Reinitz et al. 2003)
• Geometric cooling
– Gradient descent • Three GD variants approximating the objective function, which
was not continuously differentiable.
• Judged on accuracy achieved in the given time – Drosophila MSE2 data with 400 data points and 7 TF
(16 free parameters).
![Page 16: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/16.jpg)
Optimization Simulated Annealing
1 2 5 10 20 50 100 200 500
0.9
50.9
60
.97
0.9
80
.99
1.0
0time [minutes]
CC
SA_geom
GD_softmax
GD_nomax
GD_max
1 2 5 10 20 50 100 200 500
05
10
15
20
time [minutes]
RM
S e
rro
r
SA_geom
GD_softmax
GD_nomax
GD_max
1 2 5 10 50 200
0.9
50.9
70.9
9
time [minutes]
CC
SA LAMSA geom
1 2 5 10 50 200
05
10
15
20
time [minutes]R
MS
err
or
SA LAMSA geom
Gradient Descent
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Suggests: many local minima.
![Page 17: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/17.jpg)
If gradient descent gets stuck in local minima all the Jme, how does the opJmizaJon landscape look like ?
![Page 18: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/18.jpg)
Landscape analysis
• Synthetic data based on real MSE2 data – global minimum and solution (parameter values) are
known.
– Measuring distance of the optimization solution to the starting position and the known solution.
– Measuring error reduction at the
solution compared to the
starting position.
![Page 19: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/19.jpg)
Landscape analysis Experiment Ini$al distance to
solu4on (mean) Final distance to solu4on (mean)
Error Red. (mean)
1% perturbed 88%
random 0.1 0.11 97%
2.8·10!43.4·10!4
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Conclusion: many local minima.
![Page 20: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/20.jpg)
Does the model over-fit ?
• Cross-validation (5-fold)
• Redundancy reduction – Not enough data to begin with
Experiment Mean RMS error (SE)
Mean CC (SE)
training 13.39 (0.004) 0.92
tesJng 14.04 (0.005) 0.91
(4.8 · 10!5)
(5.7 · 10!5)
![Page 21: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/21.jpg)
Summary: Optimization & Analysis
• The objective function is ill-posed. – It has a plethora of local
minima. – It might have many
global minima.
• Hence SA is the method of choice.
• There might be a tendency to over-fit the data.
hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html hLp://images.nciku.com/
![Page 22: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/22.jpg)
Research Goals
• Optimize Thermodynamic models efficiently
• Analyze robustness of these models
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
![Page 23: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/23.jpg)
Regulation and Evolution of eve
Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106
hLp://www.bio.ilstu.edu/Edwards/
• Mechanism for regulating eve is conserved: – Stripe 2 elements from other
Drosophila species activate
eve in D. mel. correctly. – Despite the substantial
difference in the
regulatory DNA
sequence.
![Page 24: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/24.jpg)
Evaluate Evolution of MSE2
• Test if the model can identify the MSE2 in these other species.
• Test if the model correctly predicts the transcriptional output of the homologous MSE2s.
![Page 25: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/25.jpg)
Searching for MSE2
• Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species
23 27 43 … 13 …
40 50 60 70 80 90
050
100
150
40 50 60 70 80 90
050
100
150
RMS error
< >
eve promoter MSE2
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
Other species
![Page 26: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/26.jpg)
Searching for MSE2: Result
• Correctly identified the MSE2 in 6/8 species
D. m
elan
ogas
ter
1020
3040
D.p
seud
oobs
cura
1020
3040
D. g
rimsh
awi
1020
3040
D. m
ojav
ensi
s
1020
3040
!8000 !6000 !4000 !2000 0 1000 3000 5000 7000 9000
Position Relative to Eve
rms
erro
r
Genomic locaJon
RMS error
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
![Page 27: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/27.jpg)
0 500 1000 1500
!1
5!
10
!5
05
10
15
rel. genomic position
Lo
g o
dd
s s
co
re (
bits)
bicoid
knirps
kruppel
caudal
giant
tailless
hunchback
Predicting the output in other species
40 50 60 70 80 90
05
01
00
15
0
A!P position (%)
rela
tive
RN
A c
on
ce
ntr
atio
n
TargetD. melanogasterD. pseudoobscuraD. ananassaeD. mojavensis
D. m
elanogaster
D. m
ojaven
sis
• Apply a model trained on D. mel. MSE2 to the MSE2s in other species
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
![Page 28: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/28.jpg)
Summary Application
• Model fits the data qualitatively.
• Predictions are biologically meaningful.
• However, there is room for improvement.
![Page 29: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/29.jpg)
Research Goals
• Optimize Thermodynamic models efficiently
• Analyze robustness of these models
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
![Page 30: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/30.jpg)
One role fits them all?
• Dual function is proposed for some of the regulatory TFs. – E.g. TF Hunchback (Hb) might be an activator when
regulating stripe2 and repressor for stripe3.
Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271
Late1 3+7 2 P late2 4+6 1 5
![Page 31: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/31.jpg)
Determine the regulatory role of TFs
• Different data set: 44 CRMs important for D. mel. development but same set of TFs.
• Determine the best role for each TF in each of the CRMs – Brute Force: train a model for all TF role-combinations on
each of the 44 CRMs.
– Record the correlation achieved.
– Identify TFs that have dual-function.
Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes Drosophila segmentaJon. Nature, 2008, 451, 535‐540 Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
![Page 32: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/32.jpg)
TFs with dual role
• E.g. Hb – Activator for 17 CRMs – Repressor for 27 CRMs
Bcd Cad Hb Tll Gt Kr Kni TorRE
Det. roles s + s ‐ s s ‐ s
Literature (consensus)
+ + s ‐ (s) s ‐ NA
“s”: dual-functioning, “+”: activator, “-”: repressor.
Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster. PLoS Comput Biol, 2006, 2, e51 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271
![Page 33: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/33.jpg)
Improvement with dual function
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
kr_CD1_ru
AP
mR
NA
target
previous roles
HbDual
KrDual
HbKrDual
best
0 20 40 60 80 1000.0
0.2
0.4
0.6
0.8
1.0
hb_anterior_actv
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
kni_+1
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
run_stripe5
AP
mR
NA
target
previous roles
HbDual
KrDual
HbKrDual
best
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
eve_37ext_ru
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
eve_stripe2
AP
mR
NA
Experiment number of free parameters
mean CC (SE)
Previous roles
18 0.27 (0.008)
HbDual 19 0.35 (0.009)
KrDual 19 0.37 (0.007)
HbKrDual 20 0.38 (0.007)
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
![Page 34: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/34.jpg)
Marker motifs for dual function
MEME (no SSC) 15.07.09 12:07
0
1
2
3
4
bits
1VLI
2K 3DQ
4E
• Running MEME on the protein sequence of dual-functioning TFs to find short motifs (<6aa) present in all of them.
MEME (no SSC) 15.07.09 12:07
0
1
2
3
4
bits
1YLK
2
C3
QEDG
4ISUMOyla(on
mo(f
![Page 35: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/35.jpg)
SUMOylation
ATP
E1 activatingenzyme
E2 conjugatingenzyme
+ E3 ligasistarget protein
SUMO protease
SUMOpathway
SU
SU
SU
SU
• Small Ubiquitin-related Modifier a small protein covalently attached to target-proteins.
• Involved in many pathways/mechanisms – Compartmentisation
– Transcriptional regulation • Can reverse the function of a TF e.g.
Ikaros (the human homologue of Kr)
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005, 25, 2688‐2697
• SUMO (Smt3) is present in D. mel during development
![Page 36: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/36.jpg)
Conclusion
• Thermodynamic models can be best optimized using SA but over-fitting is an issue to keep in mind.
• Non-the-less, they are applicable for – examining the mechanisms of transcriptional regulation,
– explore the evolution of a particular regulatory mechanism
• Model prediction improves when dual-function is allowed.
– SUMOylation seems to be a good candidate for the biological mechanism of role-change.
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
![Page 37: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/37.jpg)
Acknowledgments • IMB
– Timothy Bailey (supervisor)
– Mikael Bodén (supervisor)
– Sean Grimmond (thesis committee)
– Nick Hamilton (thesis committee) – Fabian Buske – Stefan Maetschke
• Stony Brook University – John Reinitz
• Funding
– Institute for Molecular Bioscience, The University of Queensland
– Australian Research Council Centre of Excellence in Bioinformatics
– National Institutes of Health
– UQ International Research Tuition Award
www.bioinforma(cs.org.au/stream
Framework for modeling, visualizing, and predicJng the regulaJon of the transcripJon rate of a target gene
![Page 38: Deciphering the regulatory code in the genome](https://reader034.vdocuments.mx/reader034/viewer/2022042816/558973ccd8b42aa44a8b463a/html5/thumbnails/38.jpg)
• Framework for modeling, visualizing, and predicting the regulation of the transcription rate of a target gene.
• Publicly available
• Modular: New functions can be plugged in
Com
man
d lin
e
Man
y fu
nctio
ns
Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.
www.bioinforma(cs.org.au/stream