a biophysical approach to predicting intrinsic and extrinsic nucleosome positioning signals...
TRANSCRIPT
A biophysical approach to predicting intrinsic and extrinsic nucleosome positioning signals
Alexandre V. MorozovDepartment of Physics & Astronomy and
the BioMaPS Institute for Quantitative Biology,Rutgers University
IPAM, Nov. 26 2007IPAM, Nov. 26 2007
Introduction to chromatin scales
Electron micrograph of D.Melanogasterchromatin: arrays of regularly spacednucleosomes, each ~80 A across.
Overview of gene regulation
Prediction and design of gene expression levels from
DNA sequence:1. Prediction of transcription factor and nucleosome
occupancies in vitro and in vivo from genomic sequence
2. Prediction of levels of mRNA production from transcription factor and nucleosome occupancies
Gene
[mRNA]
[TF1] [TF2] [TF3]
[Nucleosomes]
RNA Pol II + TAFs
Available data sourcesAvailable data sources::
DNA sequence data for multiple organisms:
Genome-wide transcription factor occupancy data (ChIP-chip):
Structural data for 100s of protein-DNA complexes:
Nucleosome positioning data: MNase digestion + sequencing or microarrays
Data for modeling eukaryotic gene regulation
…accagtttacgt…
Wray, G. A. et al. Mol Biol Evol 2003 20:1377-1419
Biophysical picture of gene transcription
Chromatin Structure & Nucleosomes
Structure of the nucleosome core particle (NCP)
T.J.Richmond: K.Luger et al. Nature 1997 (2.8 Ǻ); T.J.Richmond & C.A.Davey Nature 2003 (1.9 Ǻ)
Left-handed super-helix: (1.84 turns, 147 bp, R = 41.9 A, P = 25.9 A)PDB code: 1kx5
Gene regulation through chromatin structure
Transcription factor – DNA interactions are affected by the chromatin Chromatin remodeling by ATP-dependent complexes Histone variants (H2A.Z) Post-translational histone modifications (“histone code”)
H3 tail
H4H2BH2A H3
0.00
1.00
2.00
3.00
f1 f2 f3
Rel
ativ
e af
fin
ity
(fo
ld t
o f
1)
0.00
0.20
0.40
0.60
0.80
1.00
h1 h2 h3R
elat
ive
affi
nit
y (f
old
to
h1)
0.00
0.20
0.40
0.60
0.80
1.00
g1 g2 g3 g4 g5
Rel
ativ
e af
fin
ity
(fo
ld t
o g
1)
38 48 58 68 78 88 98 108 118 128 1388 18 28
dyad
f1 c t gga ga a t c c c ggt gc c ga ggc c gc t c a a t t gg t c g t a gc a a gc t c t a gc a c c gc t t a a a c gc a c gt a c gc gc t g t c c c c c gc gt t t t a a c c gc c a a gggga t t a c t c c c t a g t c t c c a ggc a c gt g t c a ga t a t a t a c a t c c t g t
f2 c t gga ga t a c c c ggt gc t a a ggc c gc t t a a t t gg t c g t a gc a a gc t c t a gc a c c gc t t a a a c gc a c gt a c gc gc t g t c t a c c gc gt t t t a a c c gc c a a t a gga t t a c t t a c t a g t c t c t a ggc a c gt g t a a ga t a t a t a c a t c c t g t
f3 gt c g t a gc a a gc t c t a gc a c c gc t t a a a c gc a c gt a c gc gc t g t c t a c c gc gt t t t a a c c gc c a a t a gga t t a c t t a c t a g
g1 a t gga t c c t t gc a a gc t c t t gg t gc gc t t t t t c ggc t g t t ga c gc c c t g t t c ggc a gt t t t t gc gc a c c t t ga gc c c c c t c t c c gga a t t c a c
g2 a t gga t c c gc gc a a gc t c gc ggt gc gc t t a a a c ggc t ggc ga c gc c c t ggc c ggc a gt t t a a gc gc a c c gc ga gc c c c c t c t c c gga a t t c a c
g3 a t gga t c c t c gc a a gc ga gc t t t gc t a ggc c c c g t c t g t c gc c t c a c ggga c gga a ggggc c t a gc a c a gc t c gc c c c c gc t c c gga a t t c a c
g4 a t gga t c c a t gc a a gc t c a t ggt gc gc a a t t t c ggc t ga t ga c gc c c t ga t c ggc a ga a a t t gc gc a c c a t ga gc c c c c t c t c c gga a t t c a c
g5 a t gga t c c a t gc a a gc t c a t ggt gc gc c c gggc ggc t ga t ga c gc c c t ga t c ggc a gc c c gggc gc a c c a t ga gc c c c c t c t c c gga a t t c a c
h1 c t gga ga a t c c c ggt gc c ga ggc c gc t c a a t t gg t c g t a gc a a gc t c t a gc a c c gc t t a a a c gc a c gt a c gc gc t g t c c c c c gc gt t t t a a c c gc c a a gggga t t a c t c c c t a g t c t c c a ggc a c gt g t c a ga t a t a t a c a t c c t g t
h2 a t gga t c c t a gc a a gc t c t a ggt gc gc t t a a a c ggc t g t a ga c gc c c t a t c c t g t a c ggc a gt t t a a gc gc a c c t a ga gc c t c c gga a t t c a c
h3 a t gga t c c t a gc a t a c t c t a ggt t a gc t t a a a c t a c t g t a ga c t t a c t g t a c ggc a gt t t a a gc t a a c c t a ga gt a c c c t c t c c gga a t t c a c
Adding key dinucleotide motifs increases nucleosome affinity Deleting dinucleotide motifs or disrupting their spacing decreases affinity
Experimental validation of thehistone-DNA interaction model
Jon WidomJon Widom
dyad
c t g t c c cc c g c
gt
tt
ta
ac
cg
cc
aa
gg
gg
atta
ct
gcgcatgc
ac
gc
aa
at
tc
gc
ca
cg
at
ct c g a a c g
Histone-DNA interaction model and DNA flexibility
Nucleosome affinity depends on the presence and spacing of key dinucleotide motifs (e.g. TA,CA)
Nucleosome affinity can be explained by DNA flexibilityAA
TTTA
AATTTA
AA
TTTA
AA
TTTA
AATT
TAA
A TT TA
AA
TT
TA
GC
GC
GC
GC
GC
GC
GC
dyad
Base-pair steps are fundamental units for DNA mechanics
Data-driven model for DNA elastic energy (DNABEND)
Geometry distributions for TA steps in ~100non-homologous protein-DNA complexes:
Quadratic sequence-specificDNA elastic energy:• mean = <θ>• width ~ <(θ - <θ>)2>-1
• Matrix of force constants: F
W.K. Olson et al., PNAS 1998
E Fe l ij ii jb s
i j j ( )( ), 16
Elastic rod model
DNA looping induced by a Lac repressor tetramer
Δr E rconstr bpbp
2
Minimize to determineenergy & geometry:
E to t E to t
i
0
Elastic energy and geometry of DNA constrained to follow an arbitrary curve
(DNABEND)
System of linear equations: ½ x 6Nbs x 6Nbs
Sequence-specific DNA elastic energy
“Constraint” energyconstreltot wEEE
Prediction for NCP (1kx5)Prediction for NCP (1kx5)Ideal superhelixIdeal superhelix
Example of DNA geometry prediction: nucleosome structure
Construct nucleosome-DNA modelusing observed dinucleotide frequencies
Predictions of nucleosome binding affinities
Experimental techniquesExperimental techniques:
nucleosome dialysis A.Thastrom et al., J.Mol.Biol. 1999,2004; P.T.Lowary & J.Widom, J.Mol.Biol. 1998 nucleosome exchange T.E.Shrader & D.M.Crothers PNAS 1989; T.E.Shrader & D.M.Crothers J.Mol.Biol. 1990
Alignment modelAlignment model (Segal E. et al. Nature
2006): Collect nucleosome-bound
sequencesin yeast
Center align sequences
AGGTTTATAG..AGGTTTATAG..AGGTTAATCG..AGGTTAATCG..AGGTAAATAA..AGGTAAATAA..………………………………....
Alignment Model (in vivo selection)
MNase digestion
Extract DNA, clone into plasmids
Sequence and center-align
Di-nucleotide log score: )](/)|([log 11
1
1
iBii
L
iSPSSP
142-152 bp
From nucleosome energies to probabilities and occupancies
Use dynamic programming to find the partition function and thus probabilities and occupancies of each DNA-bindingfactor, e.g. nucleosomes
Chromosomal coordinateChromosomal coordinate
Nucleosome energyNucleosome energy
Nucleosome Probability & OccupancyNucleosome Probability & Occupancy
Z E con fcon f
exp [ ( )]
Chromosomal coordinateChromosomal coordinate
TGACGTCATGACGTCA
TGACGTCATGACGTCA
TGACGTCATGACGTCA
Nucleosome occupancy is dynamic
Nucleosome-free site
Nucleosome is displacedby the bound TF
Nucleosome-occluded site
Nucleosome occupancy of TATA boxes explains gene expression
levels
Nucleosome occupancy in the vicinity of genes
Nucleosome occupancy in the vicinity of TATA boxes: default
repression
TATA
Functional sites by ChIP-chip:in vivo genome-wide measurements
of TF occupancy
Genome-wide occupancies for 203 transcription factors in yeast by ChIP-chip (Harbison et al., Nature 2004: “Transcriptional regulatory code”)
MacIsaac et al., BMC Bioinformatics 2006: “An improved map of phylogenetically conserved regulatory sites”(98 factor specificities + 26 more from the literature)
Nucleosome occupancy of transcription factor binding sites:
default repression
• <Occ(functional sites)> - <Occ(non-functional sites)>• In vitro: nucleosomes compete for DNA sequence only with each other
p < 0.05
DNABEND: NucleosomesDNABEND: Nucleosomes
Nucleosome occupancy of transcription factor binding
sites
• <Occ(functional sites)> - <Occ(non-functional sites)>• In vivo: nucleosomes compete for DNA sequence with TFs
p < 0.05
DNABEND: Nucleosomes + TFsDNABEND: Nucleosomes + TFs
Functional transcription factor sites are clustered
functional sites
non-functional sites
Clustering!
DNABEND: Nucleosomes + TFs, randomized functional sitesDNABEND: Nucleosomes + TFs, randomized functional sites
p < 0.05
Functional transcription factor sites are not occupied by nucleosomes in
vivo
Yuan et al. microarray experimentDNABEND + Transcription FactorsDNABENDAlignment model
TGACGTCATGACGTCA
Nucleosome-induced cooperativity
Nucleosome-occludedTF sites: no separatebinding
TAAGGCCTTAAGGCCT
TGACGTCATGACGTCA TAAGGCCTTAAGGCCT
Nucleosome-occludedTF sites: cooperativebinding
Miller and Widom, Mol.Cell.Biol. 2003
Nucleosome occupancy of TF sites in a model system
pCYC1pCYC1
TF sites
Nucleosome-induced cooperativity:example
GAL1GAL10
Nucleosome position predictions:GAL1-10 locus
Nucleosomes in vitroNucleosomes in vivoTBPGAL4
Nucleosome position predictions:HIS3-PET56 locus
Nucleosomes in vitroNucleosomes in vivoTBPGCN4
Conclusions
Predicted histone-DNA binding affinities and genome-wide nucleosome occupancies using a DNA mechanics model + a thermodynamic model of nucleosomes competing with other factors for genomic sequence
Chromatin structure around ORF starts is consistent with microarray-based measurements of nucleosome positions, and can be explained with a simple model of nucleosomes “phasing off” bound TBPs
Nucleosome-induced cooperativity (brought about by clustering of functional transcription factor binding sites) is responsible for the increased accessibility of functional sites
Future Directions
Lots of nucleosome positioning sequences [soon to become] available – can a better model of dinucleotide (base stacking) energies be built? {Anirvan Sengupta, Rutgers}
Can such a model be used to inform a better DNA mechanics model? Conversely, can a DNA mechanics model be “compressed”, i.e. encapsulated in a simple set of dinucleotide energies? {Anirvan Sengupta, Rutgers}
DNABEND extensions to non-nucleosome systems, i.e. nucleoid proteins, DNA loops etc.? {John Marko, Jon Widom, Northwestern}
Prediction of in vivo nucleosome positions in gene expression libraries {Ligr et al., Genetics 2006: random libraries of yeast promoters; Lu Bai et al., unpublished}
PEOPLE:
Eric SiggiaEric Siggia (Rockefeller University)
Jon WidomJon Widom (Northwestern University)
Harmen BussemakerHarmen Bussemaker (Columbia University)
FUNDING:
Leukemia & Lymphoma Society Fellowship BioMaPS Institute, Rutgers University
Acknowledgements
Nucleosome occupancy of chromosomal regions
Induced periodicity of stable nucleosomes
stable stable
Nucleosome position predictions:summary