a model of bacterial chromosome architecture matthew wright, daniel segre, george church

48
A Model of Bacterial Chromosome Architecture Matthew Wright, Daniel Segre, George Church

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

A Model of Bacterial Chromosome Architecture

Matthew Wright, Daniel Segre, George Church

Ja mie Goodsell

Genomic Scale Structure

Can we understand the 3-d structure of the chromosome?

How optimal is the spatial organization of DNA for cell?

Can we link function and chromosome structure?

DNA structure has conserved features

Hypothesis

Mycoplasma Pneumoniae

816 Kbp90% Coding688 Genes110 Membrane Proteins52 Ribosomal ProteinsNo Active TransportNo RegulationLimited MetabolismFew DNA Binding Proteins

A Model System

.5 m diameter

.06 m3 volume8000 Ribosomes would fill the cell

Extended DNA 80 m in diameterover 100 times cell diameter

“Nose” polarity

Features

Microscopy Cross-linking Loop Patterns

Tom KnightGasser et al. Science 2002 296 Dekker etal. Science 2002 295

Empirical Constraints

Transmembrane ProteinsPotter MD, Nicchitta CV, 2002 J Biol Chem. 2002 Jun 28;277(26)

110 genesRNA and or Protein Complexes

52 genesMetabolism

DNA Structural Forces Tobias I et al Phys Rev E Stat Phys Plasmas Fluids Relat Intdisc.

Topics. 2000 Jan;61(1)

Replication

Theoretical Constraints

Symmetry Constraints

Symmetric Replication

If polymerases replicate at a constant ratesymmetric sites from origin are close when replicated

Flattened Circle

O T

C w1 d(M i,) i

w2 d(Ri,R j )i, j

R1

R2

M1 M2

M3

Cost Function

+ other terms

Random Walk of Genome Montecarlo of Parametrized Structures

Methods

Random Walk

r

n segments2n-1 Parameters

Montecarlo of Parametrized Structures

A Random Walk in Helical Parameter Space

General Helix Parameters a (rise)

Supercoil Parametersw (frequency)Ac (amplitude of cos)As (amplitude of sin)

Radial ParametersR (maximum large radius) d (frequency of large radial oscillations)

Helix Parameters

0 50 100 150 200 250 300 350 400 450 5000

200

400

600

800

1000

1200

1400

1600

1800

time steps

Ene

rgy

Energy Decreases

Trivial Solution

Entangled Solution

Possible Solution

Gene Distribution on Structure

Begin With Optimization in Helical Parameter Space

Then Perform Random Walk of Genome for Secondary Optimization

Generate Relatively Ordered Structures while allowing Local Disorder to Meet Constraints

Combine Both Methods

Starting Structure

Final Structure

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

time steps

cost

Energy

Prelimary data are promising

Incorporate Distance Geometry

Need to calculate statistics

Gather experimental Datapredict and test

Incorporate Replication and Dynamics

Current

Distance Geometry

• Represent Structure in terms of distances• Constraints fit into a single matrix• Matrix with “bounds” defines all possible

configurations• Can find inconsistencies in constraints• Rotationally invariant

Basis

• Cholesky or eigenvalue decomposition of inner product matrix, M

• Can get M from D, matrix of distances by defining an origin

XX t M LLt

XX t M SS t

d0i 1

Ndij

j

1

N 2d jk

jk

Additional Cost TermsProximity of Enzymes during Metabolism

Stoichiometric Matrix

Curvature

Replication

Incorporate Forces on DNA by Using Elastic Rod Model

Classical Model

Constraints from Replication

Paired Fork Model

Polymerase Based Model

Replicate chromosome structure and separate

t

If constraints based on function predict structurethen structure and function are related at genome scale

Potential new class of model

Conclusions

Acknowledgements

George Church

Daniel Segre’

Church Lab

Method

• Place constraints in matrix

• Solve for upper and lower bounds from triangle inequalities

• Randomly choose a configuration within these bounds

• Embed in 3 dimensions

• Minimize error

Model for nose replication

Seto S, Layh-Schmitt G, Kenri T, Miyata M. J Bacteriol 2001 Mar;183(5):1621-30 Visualization of the attachment organelle and cytadherence proteins of Mycoplasma pneumoniae by immunofluorescence microscopy.

Bidirectional

2 Polymerase Complexes Remain Attached

Daughter DNA Separate Sides

Causes Minimal Entanglement

Allows for Multiple Firing of Origins

Paired fork model

Topological Consequences

Triangle Bound Smoothing

uij uik ukj

lij lik ukj

Upper bounds

Lower bounds

x frame

Rcos(dt)cos(t)

Rcos(dt)sin(t)

at

t

Ý x Ý x

n

Ý t Ý t

b

t

n

x local

t

n

b

Ac cos(wt)

Assin(wt)

0

x

x frame

x local

Frenet Frame on Helix

P(i,t)

P(i,t+1)

P(i-1,t) P(i+1,t)

P(i+1,t+1)P(i-1,t+1)

dd

d d

Relaxing the Perturbed Structure

Melting Temperature

• Short Duplex– C total concentration of single strands

• Long Duplex

Tm H

R logC S

llCGNaTm /500/)(41]log[6.165.81

Wordsize(a digression)

• Blast seeds with at least 7 base string of identities

• Want to find all alignments with at most 20 mismatches

• What is the probability of finding a stretch of 7 identities in a string of length 70 with 20 mismatches?

Marbles

• Maps into the problem of partitioning a string of length 70 into 21 bins

• Total number of ways

20

70

11101110111101001101011101111111010101111011 etc

Counting

• Now count the fraction with at least a stretch of 7

1

21

20

63

•But over-counting is a problem

Correcting

• The cases where 2 bins each have a 7 mer is counted twice so subtract this number once

2

21

20

56

1

21

20

63

•Problem with the cases where there are 3 bins with a 7 mer

3

21

20

51)1

2

33(

2

21

20

57

1

21

20

63

Correction Continued

Principle of inclusion-exclusion

17

1

)1(21

20

770

l

l l

l

Extension

• Coefficients for at least m bins of wordsize l

• m=2

– 1,-2, 3,-4 …

...4

21

20

44)1

2

32

2

4(

3

21

20

51)1

2

3(

2

21

20

57

•m=3

–1,-3,5,-7

A familiar object?

1 1 1 1 1 1 1

1 2 3 4 5 6

1 3 6 10 15

1 4 10 20

1 5 15

1 6

1

Hello Blaise

1

1 1

1 2 1

1 3 3 1

1 4 6 41

5 10 10 5