conformation networks: an application to protein folding zoltán toroczkai center for nonlinear...

Conformation Networks: an Application to Protein

Folding

Zoltán Toroczkai

Center for Nonlinear Studies

Los Alamos National Laboratory

Center for Nonlinear Studies

Erzsébet Ravasz

Gnana Gnanakaran (T-10)Theoretical Biology and Biophysics

Proteins

the most complex molecules in nature

globular or fibrous

basic functional units of a cell

chains of amino acids (50 – 103)

peptide bonds link the backbone

unique 3D structure (native physiological conditions)

biological function

fold in nanoseconds to minutes

about 1000 known 3D structures: X-ray crystallography, NMR

Native state

153 Residues, Mol. Weight=17181 [D], 1260 Atoms

Main function: primary oxygen storage and carrier in muscle tissue

It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO

Myoglobin

Protein conformations

• defined by dihedral angles

2 angles with 2-3 local minima of the torsion energy

N monomers about 10N different conformations

Amino-acid

Levinthal’s paradox

• Levinthal’s paradox, 1968

finding the native state by random sampling is not possible

40 monomer polypeptide 1013 conf/s

3 1019 years to sample all

universe ~ 2 1010 years old

Wetlaufer, P.N.A.S. 70, 691 (1973)

Levinthal, J. Chim. Phys. 65, 44-45 (1968)

nucleation

folding pathways

• Anfinsen: thermodynamic hypothesis

native state is at the global minimum of the free energy

Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol. 28, 439 (1963)

Free energy landscapes

• Bryngelson & Wolynes, 1987

free energy landscape

Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)

a random hetero-polymer typically does NOT fold

Davidson & Sauer, P.N.A.S. 91, 2146 (1994)

Experiment:— random sequences— GLU, ARG, LEU— 80-100 amino-acids

~ 95% did not fold in a stable manner

Funnels

• Leopold, Mortal & Onuchic, 1992

Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992)

many folding pathways

Energy funnels Given any amino-acid sequence: can we tell if it is a good

folder? experiments (X-ray, NMR) molecular dynamics simulations homology modeling

Difficult and slow

Molecular dynamics

• State of the art

supercomputer (LANL)

Ribosome in explicit solvent:– targeted MD– 2.64x106 atoms (2.5x105 + water)– Q machine, 768 processors– 260 days of simulation (event: 2 ns)

Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)

– more than 100,000 CPU’s– simulation of complete folding event

» BBA5, 23-residue, implicit water» 10,000 CPU days/folding event (~1s)

distributed computing (Stanford, Folding@home)

Shirts & Pande, Science 290, 1903 (2000)Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002)

~ 1016 times slower

Configuration networks

• Configuration networks

NODE configurationLINK change of one

degree of freedom (angle)

refinement of angle values continuous case

Protein conformations

dihedral angles have few preferred values

Ramachandran mapPDB structures

Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963)

• Helix• Sheet• other

Why networks?

• VERY LARGE: 100 monomers 10100 nodes. However:

Generic features of folding are determinedby STATISTICAL properties

of the configuration network

degree distribution average distance clustering degree correlations

Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)

toolkit from network research

captures the high dimensionality

faster algorithms to simulate folding events

pre-screening synthetic proteins

insights into misfolding

A real example

• The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342342, 299 (2004)

beta3s: 20 monomers, antiparallel beta sheets

MD simulation, implicit water

330K, equilibrium folded random coil

NODE -- 8 letters / AA (local secondary struct)

LINK -- 2ps transition

Its native conformation has been studied by NMR experiments:

De Alba et.al. Prot.Sci. 8, 854 (1999).

Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state.

•Simulations @ 330K

•The average folding time from denaturated state ~ 83ns

•The average unfolding time ~83ns

•Simulation time ~12.6s

•Coordinates saved at every 20ps (5105 snapshots in 10s)

•Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured).

•The native state: -EEEESSEEEEEESSEEEE-

•There are approx. 818 1016 conformations.

•Nodes: conformations, transitions: links.

Many real-world networks are scale free

hubs

co-authorship (=1 - 2.5) citations (=3) sexual contacts (=3.4) movie actors (=2.3) Internet (y=2.4) World Wide Web (=2.1/2.5) Genetic regulation (=1.3) Protein-protein interactions ( =2.4) Metabolic pathways (=2.2) Food webs (=1.1)

Barabási & Albert, Science 286, 509,

(1999);

Scale-free network

beta3srandomized

Many reasons behind SF topology

• Why is the protein network scale free?• Why does the randomized chain have similar degree distribution?• Why is = - 2 ?

Robot arm networks

000 100 200

010

020

021

000 100 200

010

020

021

n-dimensional hypercube

binomial degree distribution

20 1

00

0102 10 11 12 20

21

22

n=0

n=1

n=2 • Steric constraints?

missing nodes

missing links

Swiss cheese

00 10 20

01

02 12 22

2111

Homogeneous

A bead-chain model

• Beads on a chain in 3D: robot arm model

similar to C protein models

rod-rod angle

3 positions around axis

N=18; = 120 2212112212111122

N=6; = 90

Honeycutt & Thirumalai, Biopolymers 32, 695 (1992)

Homogeneous network

Another example: L = 7, = 75 , r = 0.25

“00100”

state “00100”

allowed state

forbidden state

Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world.

Shortcuts in Folding Space

The “dilemma”

HOMOGENEOUS

• from studies of conformation networks

bead chain

robot arm

SCALE FREE

• from polypeptide MD simulations

beta3s

randomized version

?

Gradient NetworksGradient Networks

Ex.:

Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes”

Load balancing in parallel computation and packet routing on the internet

Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.).

Naturally, gradients will induce flows on networks as well.

Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004)

Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262

References:

Setup:

Let G=G(V,E) be an undirected graph, which we call the substrate network.

}1,...,2,1,0{},...,,{ 110 −≡= − NxxxV N The vertex set:

loops)-self (no ),,( , , ExxjixxeEeVVE ji ∉==∈×⊂ The edge set:

A simple representation of E is via the Nx N adjacency (or incidence) matrix AA

⎩⎨⎧

∉∈

==Eji

EjiaxxA ijji ),( if 0

),( if 1),(

Let us consider a scalar field ℜ→Vh :}{

Set of nearest neighbor nodes on G of i :)1(

iS

(1)

Definition 1 The gradient h(i) of the field {h} in node i is a directed edge:

))(,()( iiih =∇

Which points from i to that nearest neighbor }{)1( iSi U∈ for G for which the increase in the

scalar is the largest, i.e.,:

)(maxarg)(}{)1(

jiSj

hii U∈

=

The weight associated with edge (i,) is given by:

ihhih −=∇ )(

)(),()( then )( If iiiihii 0≡=∇= The self-loop )(i0.. is a loop through i

with zero weight.

Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network:

),( FVGG ∇=∇

(3)

(2)

If (3) admits more than one solution, than the gradient in i is degenerate.

In the following we will only consider scalar fields with non-degenerate gradients. This means:

0}),( if {Prob. =∈= Ejihh ji

Theorem 1 Non-degenerate gradient networks form forests.

Proof:

Theorem 2 The number of trees in this forest = number of local maxima of {h} on G.

0.43

0.1

0.2

0.5

0.2

0.15

0.7

0.6

0.87

0.440.24

0.14

0.18

0.16 0.13

0.15

0.05

0.65 0.8

0.55

0.160.19

0.2

0.670.44

0.05

0.82

0.46

0.48

0.650.67

0.53

0.650.22

0.32

0.65

, 1 , 1

)(

: 1 , . , ,0limit In the

Npzlzl

lR

zconstNpzNp

N =<≤≈

>>==∞→→

For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:

The Configuration model

A. Clauset, C. Moore, Z.T., E. Lopez, to be published.

K-th Power of a Ring

Generating functions: ∑=i

ki zkzg )(

∫ ⎟⎟⎠

⎞⎜⎜⎝

⎛′′

−−=1

0 )1(

)()1(1 )(

g

xgxzgdxzR

( )

( )

( )

( )⎪⎪⎪⎪⎪⎪⎪

⎩

⎪⎪⎪⎪⎪⎪⎪

⎨

⎧

=+

−≤≤+++++++

+

=+++

++

−≤≤+++++++

+++

=

KlK

KlKlKlKlK

K

KlKKKK

KK

KllKlKlKlK

KlKK

lR K

2 ,14

1

121 ,)32)(22)(12(

124

,)33)(23)(13(3

7726

11 ,)32)(22)(12)(2(

24934

)(

2

2

)2(

2K+l

Power law with exponent =- 3

The energy landscape

What generates = - 2 ?

• Energy associated with each node (configuration)

the gradient network

most favorable transitions

T=0 backbone of the flow

MD simulation

tracks the flow network

biased walk close to the gradient network

trees

basins of local minimaThe REM generates an exponent of -1.The REM generates an exponent of -1.

Model ingredients

• A network model of configuration spaces

network topology

homogeneous

degree correlations

constrained (folded)small kconf

lower energy

loose (random coil)large kconf

higher energy

k, E increases

how to associate energies

Random geometric graph

• random geometric graph

in higher D: similar to hypercube with holes

degree correlations k

E

• Energy proportional to connectivity

R=0.113, <k>=20

Dall & Christensen, Phys.Rev.E 66, 026121 (2002)

N=30000, <k> = 1000, d=2.

Exponent is - 22 essential ingredients:

1) k1-k2 correlations2) <E> with k monotonic

AttractiveRepulsive

Lennard-Jones potential

Bead-chain model

• more realistic model: bead-chain

configuration network excluded volume

energy: Lennard-Jones

L = 30, = 75

The case of the -helix

AKA peptide

• ALA: orange• LYS: blue• TYR: green

MD simulations, no water.

T = 400

More than one simulation

3 different runs: yellow, red and green

The MD traced network

The role of temperature

Conclusions

• A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details.• Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure.• The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected.• Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours.

Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.

conformation networks: an application to protein folding zoltán toroczkai center for nonlinear...

Documents

folding events

generic features of

implicit water

days of simulation event

torsion energy n monomers

network research

d structures

free energyepstain