conformational optimization and sampling along natural coordinates

35
1 CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES Peter Minary Computational Structural Biology Group & Bio-X Center Stanford University Stanford, CA 94305

Upload: matty

Post on 05-Jan-2016

30 views

Category:

Documents


3 download

DESCRIPTION

CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES. Peter Minary Computational Structural Biology Group & Bio-X Center Stanford University Stanford, CA 94305. TALK OUTLINE. Obstacles for Deciphering the Central Dogma of MB Challenges for Optimization & Sampling Algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

1

CONFORMATIONAL OPTIMIZATION AND SAMPLINGALONG NATURAL COORDINATES

Peter MinaryComputational Structural Biology Group & Bio-X

CenterStanford UniversityStanford, CA 94305

Page 2: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

2

TALK OUTLINE

– Obstacles for Deciphering the Central Dogma of MB

– Challenges for Optimization & Sampling Algorithms

– Natural Coordinates for Biological Macromolecules

– Chain Closure Algorithms, Obstacles & Solutions

– An Atomic Level Insight into the Central Dogma• Nucleosome Positioning/Large Scale Optimization• Structure Space of RNA Junctions and Fractals• Interpretation & Refinement of Experimental Data

Page 3: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

CENTRAL DOGMA OF MOLECULAR BIOLOGY

3

F. H. Crick(1)

Tran

scrip

tiona

l

Regul

atio

n

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).

FUNCTIONM

oti

on

“If you want to understand function, study structure.” F. H. C. Crick

Page 4: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

CENTRAL DOGMA OF MOLECULAR BIOLOGY

4

F. H. Crick(1)

Tran

scrip

tiona

l

Regul

atio

n

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mo

tio

n

Page 5: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

5

TRANSCRIPTIONAL REGULATION

TF

...GTCCAGTTACGAATTGCGCGC…DNA DNA

~

Nucleosome Structure Nucleosome Positioning

...GTCCAGTTACGAATTGCGCGC…

3D Structure

E(Xi)

…..GTGAATGCCCAG…..

Scan DNA

TF

DNA in Chromatin

– Grand Challenges for CSB• Structure Based Prediction of Nucleosome Positions• Structure Based Prediction of TransF Binding Sites

• Requires All Atom Representation & Rapid Optimization• Simultaneously Explore Sequence and Structure Space

• Need Conceptually Novel Optimization/Sampling Tools

Page 6: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

CENTRAL DOGMA OF MOLECULAR BIOLOGY

6

F. H. Crick(1)

Tran

scrip

tiona

l

Regul

atio

n

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mo

tio

n

Page 7: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

POST TRANSCRIPTIONAL REGULATION

– Grand Challenges for CSB• Prediction of RNA Tertiary Structure

EXAMPLE: mRNA TRANSPORT IN NEURONS

• Need a Novel O/S Approach

• & Transport Protein Binding Sites

Page 8: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

CENTRAL DOGMA OF MOLECULAR BIOLOGY

8

F. H. Crick(1)

Tran

scrip

tiona

l

Regul

atio

n

PostTranscriptional

Regulation

Translation Folding

(1) F. H. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mo

tio

n

Page 9: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

EM images of Molecular Complex

PROTEIN MOTION

– In Current Trend: Experimentally Measured Structures Are Getting

• Larger in Size• Higher in Flexibility• Lower in Resolution

FAS

Fatty

Acid

Synthase

– In Current Refinement Methods Atomic Motions Are Modeled As

• Independent• Isotropic• Harmonic

– To Follow the Trend Atomic Motion in Refinement Methods Should Be

• Collective• Anisotropic• Anharmonic

9

– Demand for Novel Optimization Methods for Structure Refinement

Page 10: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

10

CHALLENGES FOR OPTIMIZATION & SAMPLING ALGORITHMS

– Roughness of the object function, E(X)• Leads to rare events in Markov Chain MC(1)

• Solutions– Multiple Markov Chains in Temperature(2)/Energy Domain(3, 4)

– Transformation of Variables(5) and/or using Extra Dimensions(6)

– Large number of degrees of freedom, Nd

• Number of energy basins is non polynomial in Nd

• Solutions– Local or Global Torsional Degrees of Freedom(4,7)

– Arbitrary/Most Relevant/Natural Degrees of Freedom(9)

(1) Metropolis, et al. J. Chem. Phys. 21, 1087-1091 (1953).(2) Geyer, et al. Proceedings of the 23rd Symposium on the Interface, 156-163 (1991).(3) Kou, et al. Annals of Statistics 34 1581-1619 (2006).(4) Minary et al. Annals of Statistics 34 1638-1642 (2006).(5) Minary et al. SIAM Journal of Scientific Computing 30 2055-2083 (2008).(6) Minary et al. J. Chem. Phys. 118 2510-2525 (2003) (7) Minary et al. J. Mol. Biol. 25 920-933 (2008).(8) Dodd et al. Mol. Phys. 78 961-996 (1993).(9) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

Page 11: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

11

NATURAL DEGREES of FREEDOM for

NUCLEIC ACIDS

Dx ShiftDy SlideDz Rise

τ Tiltρ Rollω Twist

Sx ShearSy StretchSz Stagger

κ Buckleπ Propellerσ Opening

xy

zSx

x

y

zSy

xy

zSz

z

xy

κ

y

zx

σ

y

xzπ

zx

y

Dx

zx

y

Dy

zx

y

Dz

x

y

x

y

z

xy

ω

dof: 10(4+12x½)

Sx

Sy

Sz

κπσ

Dx

Dy

Dzτρω

N

O3′O3′

RC

C5’

O5’ P

C4’

O1’

Movesbreak the

chain!

τ12

τ23

θ1

θ2

Page 12: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

12

NATURAL DEGREES of FREEDOM for PROTEINS

β-SHEET & α-HELIX

Sx ShearSy StretchSz Stagger

κ Buckleπ Propellerσ Opening

x

y

z

Sx

Movesbreak the

chain!

Page 13: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

13

CHAIN CLOSURE ALGORITHMS

– Analytical multi atom closure algorithms(1)

• Ncd non-linear equations and Ncd unknown, Ncd number of closure dof

• Ncd = 6 is the practical limit, given that the complexity is O(fNP(Ncd))

– Single atom Deterministic Full Closure (DFC)(2)

• Cost efficient• Two solutions or No solution

– Single atom Stochastic Partial Closure (SPC)(3) • Cost efficient• Solution always exist for• Any size of the chain break

(1) Dodd et al. Mol. Phys. 78 961-996 (1993).(2) Sklenar et al. J. Comp Chem. 27 309-315 (2005).(3) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

Page 14: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

14

RECURSIVE STOCHASTIC CLOSURE

1 cycle of RSC = DFC[ SPC[ SPC[ SPC[…] ] ] ]

Molten zone

Molten zone

DFC

1st cycle

m cycles

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

• One SPC step

– Restores 4-5, breaks 3-4

• Multiple SPC steps– Propagates the chain brake

– Narrows closure gap

• AC = O(Ncd) << O(fNP(Ncd))– Ncd = 2 Nm + 5

Page 15: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

15

MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-I

Molten zone (C4’….O3’)

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

Page 16: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

16

MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-II

• Monte Carlo Minimization(1) (MCM) is Monte Carlo on

• In MCRSC(2) is Monte Carlo on

( ) min ( )X

E X E XE

E

( ) min ( ) d

id dXiX XE X E X

minimization invariant DOF X E evaluation

MCM

MCRSC

BFGS, CG none cart/tors ~10-1000

N cycle of RSC Xi arbitrary 1

(1) Wales, D. J., Scheraga, H. A. Science 285 1368-1372 (1999).(2) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).

Page 17: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

17

• RSC works with an order of magnitude larger move sizes than DFC• RSC is like a wire, you pull the system that deforms to follow the change

RECURSIVE STOCHASTIC vs DETERMINISTIC FULL CLOSUREin MONTE CARLO: a B-DNA

zx

y

Dx

zx

y

Dy

zx

y

Dz

xy

zSx

x

y

zSy

xy

zSz

dof: 6

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

E2 binding DNA: 5’-ACCGAATTCGGT-3’ Force Field: amber99-bs0

Page 18: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

18

RECURSIVE STOCHASTIC CLOSURE vs LOOP TORSIONAL SAMPLING in MONTE CARLO: an α+β PROTEIN

SCOP id: d1div_2, 55 residue domain

(2) Minary & Levitt J. Mol. Biol. 25 920-933 (2008).(1) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

(1)

(2)

Ncd = 19

Page 19: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

19

APPLICATIONS

Page 20: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

20

THE METHOD: GENERAL PIPELINE IN SILICO NUCLEOSOME POSITIONING

Page 21: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

21

APPLICATION TO CHROMOSOME 14

(1) Cherry, J. M. et al., Nucleic Acids Res. 26, 73-79 (1998).(2) Kaplan, N. et al., Nature 458, 362-366 (2006). (3) Davey, C. A. et al., J. Mol. Biol. 319 1097-1113 (2002).(4) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(5) Perez et al., Biophysics J. 92 3817-3827 (2007).(6) Minary (2010).

ab initio

P(i)

i i

P(i)

in vitro

• Yeast Chromosome 14– 187k-189k from SGD(1)

– Experimental Data(2)

• Nucleosome template– 1.9 Å resolution– pdb code (1kx3)(3)

• Slide nucleosome along DNA– Slide a 147 bp window– Design template

• Run MCRSC on all structures– Force field: AMBER99-bs0(5)

– Software: MOSAICS(6)

• Get probability profile– P(i) ~ exp(-β <E(i)>)

187k 189k 201k 203k 205k 207k

Minary & Levitt

IN SILICO NUCLEOSOME POSITIONING

Page 22: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

NUCLEOSOME OCCUPANCY

Yeast Chromosome 14

i

Minary & Levitt

P(i)

in vivo

P(i)in vitro

ab initio P(i)

i 191000 193000 195000 197000 199000

P(i)

in vivo

P(i)in vitro

P(i)ab initio

22

187000 191000 195000 199000 203000 207000

IN SILICO NUCLEOSOME POSITIONING

Page 23: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

HIERARCHICAL NATURAL DOFs/MOVES (HNM)

23

L2L1

L1

L3 L4

EXPLORING RNA STRUCTURE SPACE

Page 24: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

RNA 4 WAY JUNCTION: SAMPLING METHODS

24

Move Set(1,2,3)

L1

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. To be submitted.(3) Minary, P., MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

L1 NM-MC(1,3)

L1 – L2

Sampling Methods

L2

L3 L4

NM-MC(1,3)

MCRSC(1)

+ . . . =

L1 - L4

L1

HNM-MC(1,2,3)

.

.

L1 – L3 HNM-MC(1,2,3)

L1 – L4. .

MCRSC(1)

+User Defined

Move Sets(Medicine/Physics)(Chemistry/Biology)

Page 25: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

RNA 4 WAY JUNCTION

25

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Parisien and Major, Nature, 452, 51 (2008).(3) R. Das, J. Karanicolas, and D. Baker, Nat. Methods 7 (4), 291 (2010). (4) Sim, A., Levitt, M., Minary, P. , To be submitted. (5) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

NM-MC(1,5) FA-MC-Sym(2) FA-Rosetta(3) HNM-MC(1,4,5)

(a) (b) (c) (d)L1 L1-L4

• Necessary condition for unbiased sampling

– Symmetric RNA -> distributions coincide

• Easy to improve by field specific move set

– RNA : relative arrangement of stem loops

• Comparing to Fragment Assembly

– Biased and non continuous sampling

– Dependence on fragment libraries

HNM-MC(1,4,5)

L1 - L4

L2

L4

L1

L3

Page 26: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

FRACTAL RNA: BEYOND CURRENT METHODS

26

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

• Necessary condition for unbiased sampling

– Symmetric RNA -> armend distributions coincide

• Further improvement by L5, L6, L7

– No limitation on improvement

• Benchmark with different move sets

– Accuracy converges by L7(1,2,3)

HNM-MC(1,2,3)

εrro

r(i)

i x 104

L1 – L4 L1 – L7

Page 27: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

FRACTAL RNA: WHY/HOW DOES IT WORK?

27(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

• Use embedded subspaces

• In particular

– : 6 DOFs / main arms(2)

– : 6 DOFs / arms of arms(2)

– : 10 DOFs / nucleotides(1)

Ω3 ⊂Ω2 ⊂Ω1 ≡ Ω

Ω1

Ω3

Ω2

Ω1

• Low cost method to approximate

• Multi scale integration(3) along

– around all

– around all

Ω2

Ω3

α = dLL∈Ω∫ α (L) f (L)

α, f :Ω→ °

L3 ∈Ω3

L2 ∈Ω2

L1 ∈Ω1

L3

L2

Page 28: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

Fatty Acid Synthase (FAS)

EM images of Molecular Complex

OBJECTIVE

Objective

initial model refined model EM image

CRYO-EM REFINEMENT

28

Page 29: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

initial structure

target structure2 Å rmsd

refined structure

VALIDATION I

(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

optimization(1)-(3)

along natural dof

target projection18 Å rmsd

CRYO-EM REFINEMENT

29

Page 30: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

Lysozyme

cc

Projection Angle

CRYO-EM REFINEMENTVALIDATION II: CROSS CORRELATION OF MAPS

Page 31: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

Etotal= Weight*EEM+ Emolecule

THE PROTOCOL CRYO-EM REFINEMENT

31

Lysozyme

Page 32: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

REFINEMENT CRYO-EM REFINEMENT

32

Page 33: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

DOMAIN FLEXIBILITY CRYO-EM REFINEMENT

33

(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS(4) Courtesy of Steve Ludtke, Baylor College, Texas.

(1)-(3)

(4)

Page 34: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

CONCLUSION

• CSB has Limited Impact due to Inefficient Conformational Sampling

• Novel Algorithms Supporting Natural DOF May Offer The Solution

• Our Novel Approach May Open New Avenues

– In The Refinement and Interpretation of Experimental Data

– In The Use of Structural Information in Molecular Biology

• Atomic Level Understanding of the CDMB may be a reality with NC

34

FUNCTION

“If the code does indeed have some logical foundation then it is legitimate to consider all the evidence, both good and bad, in any attempt to

deduce it.” F. C. H. Crick

CDMB

Page 35: CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES

35

ACKNOWLEDGEMENTS

– Michael Levitt Computer Sci. & Structural Biology, Stanford, US

– Jernei Ule Molecular Biology/MRC, Cambridge, UK

– Peter Lukavszky Molecular Biology/MRC, Cambridge, UK

– Sebastian Doniach Physics, Stanford, US

– Zev Bryan Bioengineering, Stanford, US

– Wing H Wong Statistics, Stanford, US

– Wah Chiu Baylor College, Texas, US

– Adelene Sim Physics, Stanford, US (graduate student)

– Gaurav Chopra Mathematics, Stanford, US (graduate student)

– Junjie Zhang Baylor College and Stanford, US (postdoc)

– Anatole von Lilienfeld & and Workshop Organizing Committee