conformation networks: an application to protein folding zoltán toroczkai center for nonlinear...
TRANSCRIPT
Conformation Networks: an Application to Protein
Folding
Zoltán Toroczkai
Center for Nonlinear Studies
Los Alamos National Laboratory
Center for Nonlinear Studies
Erzsébet Ravasz
Gnana Gnanakaran (T-10)Theoretical Biology and Biophysics
Proteins
the most complex molecules in nature
globular or fibrous
basic functional units of a cell
chains of amino acids (50 – 103)
peptide bonds link the backbone
unique 3D structure (native physiological conditions)
biological function
fold in nanoseconds to minutes
about 1000 known 3D structures: X-ray crystallography, NMR
Native state
153 Residues, Mol. Weight=17181 [D], 1260 Atoms
Main function: primary oxygen storage and carrier in muscle tissue
It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO
Myoglobin
Protein conformations
• defined by dihedral angles
2 angles with 2-3 local minima of the torsion energy
N monomers about 10N different conformations
Amino-acid
Levinthal’s paradox
• Levinthal’s paradox, 1968
finding the native state by random sampling is not possible
40 monomer polypeptide 1013 conf/s
3 1019 years to sample all
universe ~ 2 1010 years old
Wetlaufer, P.N.A.S. 70, 691 (1973)
Levinthal, J. Chim. Phys. 65, 44-45 (1968)
nucleation
folding pathways
• Anfinsen: thermodynamic hypothesis
native state is at the global minimum of the free energy
Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol. 28, 439 (1963)
Free energy landscapes
• Bryngelson & Wolynes, 1987
free energy landscape
Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)
a random hetero-polymer typically does NOT fold
Davidson & Sauer, P.N.A.S. 91, 2146 (1994)
Experiment:— random sequences— GLU, ARG, LEU— 80-100 amino-acids
~ 95% did not fold in a stable manner
Funnels
• Leopold, Mortal & Onuchic, 1992
Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992)
many folding pathways
Energy funnels Given any amino-acid sequence: can we tell if it is a good
folder? experiments (X-ray, NMR) molecular dynamics simulations homology modeling
Difficult and slow
Molecular dynamics
• State of the art
supercomputer (LANL)
Ribosome in explicit solvent:– targeted MD– 2.64x106 atoms (2.5x105 + water)– Q machine, 768 processors– 260 days of simulation (event: 2 ns)
Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)
– more than 100,000 CPU’s– simulation of complete folding event
» BBA5, 23-residue, implicit water» 10,000 CPU days/folding event (~1s)
distributed computing (Stanford, Folding@home)
Shirts & Pande, Science 290, 1903 (2000)Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002)
~ 1016 times slower
Configuration networks
• Configuration networks
NODE configurationLINK change of one
degree of freedom (angle)
refinement of angle values continuous case
Protein conformations
dihedral angles have few preferred values
Ramachandran mapPDB structures
Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963)
• Helix• Sheet• other
Why networks?
• VERY LARGE: 100 monomers 10100 nodes. However:
Generic features of folding are determinedby STATISTICAL properties
of the configuration network
degree distribution average distance clustering degree correlations
Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)
toolkit from network research
captures the high dimensionality
faster algorithms to simulate folding events
pre-screening synthetic proteins
insights into misfolding
A real example
• The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342342, 299 (2004)
beta3s: 20 monomers, antiparallel beta sheets
MD simulation, implicit water
330K, equilibrium folded random coil
NODE -- 8 letters / AA (local secondary struct)
LINK -- 2ps transition
Its native conformation has been studied by NMR experiments:
De Alba et.al. Prot.Sci. 8, 854 (1999).
Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state.
•Simulations @ 330K
•The average folding time from denaturated state ~ 83ns
•The average unfolding time ~83ns
•Simulation time ~12.6s
•Coordinates saved at every 20ps (5105 snapshots in 10s)
•Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured).
•The native state: -EEEESSEEEEEESSEEEE-
•There are approx. 818 1016 conformations.
•Nodes: conformations, transitions: links.
Many real-world networks are scale free
hubs
co-authorship (=1 - 2.5) citations (=3) sexual contacts (=3.4) movie actors (=2.3) Internet (y=2.4) World Wide Web (=2.1/2.5) Genetic regulation (=1.3) Protein-protein interactions ( =2.4) Metabolic pathways (=2.2) Food webs (=1.1)
Barabási & Albert, Science 286, 509,
(1999);
Scale-free network
beta3srandomized
Many reasons behind SF topology
• Why is the protein network scale free?• Why does the randomized chain have similar degree distribution?• Why is = - 2 ?
Robot arm networks
000 100 200
010
020
021
000 100 200
010
020
021
n-dimensional hypercube
binomial degree distribution
20 1
00
0102 10 11 12 20
21
22
n=0
n=1
n=2 • Steric constraints?
missing nodes
missing links
Swiss cheese
00 10 20
01
02 12 22
2111
Homogeneous
A bead-chain model
• Beads on a chain in 3D: robot arm model
similar to C protein models
rod-rod angle
3 positions around axis
N=18; = 120 2212112212111122
N=6; = 90
Honeycutt & Thirumalai, Biopolymers 32, 695 (1992)
Homogeneous network
Another example: L = 7, = 75 , r = 0.25
“00100”
state “00100”
allowed state
forbidden state
Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world.
Shortcuts in Folding Space
The “dilemma”
HOMOGENEOUS
• from studies of conformation networks
bead chain
robot arm
SCALE FREE
• from polypeptide MD simulations
beta3s
randomized version
?
Gradient NetworksGradient Networks
Ex.:
Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes”
Load balancing in parallel computation and packet routing on the internet
Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.).
Naturally, gradients will induce flows on networks as well.
Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004)
Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262
References:
Setup:
Let G=G(V,E) be an undirected graph, which we call the substrate network.
}1,...,2,1,0{},...,,{ 110 −≡= − NxxxV N The vertex set:
loops)-self (no ),,( , , ExxjixxeEeVVE ji ∉==∈×⊂ The edge set:
A simple representation of E is via the Nx N adjacency (or incidence) matrix AA
⎩⎨⎧
∉∈
==Eji
EjiaxxA ijji ),( if 0
),( if 1),(
Let us consider a scalar field ℜ→Vh :}{
Set of nearest neighbor nodes on G of i :)1(
iS
(1)
Definition 1 The gradient h(i) of the field {h} in node i is a directed edge:
))(,()( iiih =∇
Which points from i to that nearest neighbor }{)1( iSi U∈ for G for which the increase in the
scalar is the largest, i.e.,:
)(maxarg)(}{)1(
jiSj
hii U∈
=
The weight associated with edge (i,) is given by:
ihhih −=∇ )(
)(),()( then )( If iiiihii 0≡=∇= The self-loop )(i0.. is a loop through i
with zero weight.
Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network:
),( FVGG ∇=∇
(3)
(2)
If (3) admits more than one solution, than the gradient in i is degenerate.
In the following we will only consider scalar fields with non-degenerate gradients. This means:
0}),( if {Prob. =∈= Ejihh ji
Theorem 1 Non-degenerate gradient networks form forests.
Proof:
Theorem 2 The number of trees in this forest = number of local maxima of {h} on G.
0.43
0.1
0.2
0.5
0.2
0.15
0.7
0.6
0.87
0.440.24
0.14
0.18
0.16 0.13
0.15
0.05
0.65 0.8
0.55
0.160.19
0.2
0.670.44
0.05
0.82
0.46
0.48
0.650.67
0.53
0.650.22
0.32
0.65
, 1 , 1
)(
: 1 , . , ,0limit In the
Npzlzl
lR
zconstNpzNp
N =<≤≈
>>==∞→→
For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:
The Configuration model
A. Clauset, C. Moore, Z.T., E. Lopez, to be published.
K-th Power of a Ring
Generating functions: ∑=i
ki zkzg )(
∫ ⎟⎟⎠
⎞⎜⎜⎝
⎛′′
−−=1
0 )1(
)()1(1 )(
g
xgxzgdxzR
( )
( )
( )
( )⎪⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪⎪
⎨
⎧
=+
−≤≤+++++++
+
=+++
++
−≤≤+++++++
+++
=
KlK
KlKlKlKlK
K
KlKKKK
KK
KllKlKlKlK
KlKK
lR K
2 ,14
1
121 ,)32)(22)(12(
124
,)33)(23)(13(3
7726
11 ,)32)(22)(12)(2(
24934
)(
2
2
)2(
2K+l
Power law with exponent =- 3
The energy landscape
What generates = - 2 ?
• Energy associated with each node (configuration)
the gradient network
most favorable transitions
T=0 backbone of the flow
MD simulation
tracks the flow network
biased walk close to the gradient network
trees
basins of local minimaThe REM generates an exponent of -1.The REM generates an exponent of -1.
Model ingredients
• A network model of configuration spaces
network topology
homogeneous
degree correlations
constrained (folded)small kconf
lower energy
loose (random coil)large kconf
higher energy
k, E increases
how to associate energies
Random geometric graph
• random geometric graph
in higher D: similar to hypercube with holes
degree correlations k
E
• Energy proportional to connectivity
R=0.113, <k>=20
Dall & Christensen, Phys.Rev.E 66, 026121 (2002)
N=30000, <k> = 1000, d=2.
Exponent is - 22 essential ingredients:
1) k1-k2 correlations2) <E> with k monotonic
AttractiveRepulsive
Lennard-Jones potential
Bead-chain model
• more realistic model: bead-chain
configuration network excluded volume
energy: Lennard-Jones
L = 30, = 75
The case of the -helix
AKA peptide
• ALA: orange• LYS: blue• TYR: green
MD simulations, no water.
T = 400
More than one simulation
3 different runs: yellow, red and green
The MD traced network
The role of temperature
Conclusions
• A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details.• Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure.• The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected.• Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours.
Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.