development of scwrl4 for improved prediction of protein side-chain conformations in collaboration...
TRANSCRIPT
Development of SCWRL4 for improved prediction
of protein side-chain conformations
In collaboration with Moscow Engineering & Physics Institute
© George Krivov
Georgii Krivov,
Maxim Shapovalov and Roland L. Dunbrack Jr.
SCWRL® program
SCWRL
minTotalE
Version 3 was written by Adrian A. Canutescu andDr. Roland L. Dunbrack Jr.
Main assumption:
backbone
There is a finite set of possible conformations (called rotamers) for each amino acid residue.
pair-wise interactions
1. Obtain spatial backbone structure and aminoacid sequence
2. For each residue build possible side-chain conformations (rotamers) using rotamer library
3. Build interaction graph
– each vertex denotes a certain residue
– an edge between vertices indicates that there is an interaction between some rotamers of the corresponding residues
4. Find optimal assignment of side-chain conformations by graph decomposition and dynamic programming
5. Save resolved structure into file
SCWRL’s sidechains packing algorithm
PDB
PDB
Res 2
Res 1
Res 3
Res 4
Inproved dynamic programmingInproved dynamic programming
A tree-decomposition of a graph is a pair , where
based on a tree-decomposition of the interaction graph
,V EG ,T X
,I FT – is a tree with a set of vertices I and a set of edges F
:i i i IX X VX – is a family of subsets of the set V,
associated with the vertices of T,
such that
which satisfies the conditions: i I
iX V
( , ) ,: iu v E i u v XI
a set of vertices
is connected in
: i
V
i I v X
v
T
h k h lb c a
c d b
e g f d c
e d
e h i
g e
e g h
a
c b
d
e
if
g h
kl
Res 2
Res 1
Res 3
Res 4
Hans L. Bodlaender. 1992
combinatorial complexity blows-up
hardly feasible even with new
algorithm
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
Described combinatorial algorithm…Described combinatorial algorithm…
resolves global optimum(avoids stochastics)
accuracy of prediction entirely depends on rotamer library and energy potentials
is capable of larger and denser graphs than one based on biconnected components
SCWRL4 is capable of significantly larger proteins than SCWRL3
typically finishes pretty quickly no coffee-breaks…
However… However… a better accuracy is desired
quick collision detection algorithm
thermodynamic fluctuations
via Flexible Rotamers Model
combinatorial complexity blows-up
hardly feasible even with new
algorithm
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
Hierarchies of bounding boxes enable efficient search for intersections between two groups of geometric figures
2. Check each combination for overlapping
4. Continue recursively on each clashing pair
1. If overlap then split each
3. Disregard boxes that don’t clash
Given two groups of figures enclosed into k-dops…
quick collision detection algorithm
James T. Klosowski, et.al. 1998
Cubic (k = 3) Tetrahedral (k = 4)
k = 2 k = 3 k = 4
examples:
combinatorial complexity blows-up
hardly feasible even with new
algorithm
quick collision detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
– a class of convex polytopes with 2k planesany plane is orthogonal to one of k basic axes which remain fixed
– easy to enclose a ball – easy to merge– easy clash check– almost rotatable
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
Cubic (k = 3) Tetrahedral (k = 4)
k = 2 k = 3 k = 4
examples:
combinatorial complexity blows-up
hardly feasible even with new
algorithm
basic axis x i
min i
max i
i = 1..k
simple projection onto all basic axes
easy to enclose a ball easy to merge easy clash check almost rotatable
quick collision detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
– a class of convex polytopes with 2k planesany plane is orthogonal to one of k basic axes which remain fixed
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
Cubic (k = 3) Tetrahedral (k = 4)
k = 2 k = 3 k = 4
examples:
combinatorial complexity blows-up
hardly feasible even with new
algorithm
max max max
min min min
i i
i i
easy to enclose a ball easy to merge easy clash check almost rotatable
quick collision detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
– a class of convex polytopes with 2k planesany plane is orthogonal to one of k basic axes which remain fixed
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
Cubic (k = 3) Tetrahedral (k = 4)
k = 2 k = 3 k = 4
examples:
combinatorial complexity blows-up
hardly feasible even with new
algorithm
A doesn’t clash B if exists axis xi (1≤i ≤ k) such that
mm ain xx mia nmbi
bi
ai i
a or
easy to enclose a ball easy to merge easy clash check almost rotatable
k-DOP A k-DOP B
?
quick collision detection algorithm
… works best in conjunction with
k-Discrete Oriented Polytopes
– a class of convex polytopes with 2k planesany plane is orthogonal to one of k basic axes which remain fixed
Involve more rotamers more interaction to evaluate more combinations to enumerate
More realistic potentials longer interaction range
more edges in the graph less decomposable
O H
n1e
2e 0e
More realistic potentials longer interaction range
more edges in the graph less decomposable
Fast anisotropic hydrogen bond potentialFast anisotropic hydrogen bond potential
optimaldc OH d ��������������
20 1 d H Ow z c c c ma
0maxx
11 1os osc cdz
0 maxc s, oHc n e 1 maxcos,Oc n e
0, 1( ) DefaultO HE w E w E 0 H OE q qB
optimal
x m xma a
2.1 0.65 30
45 cos 5o 3c s
dd B
For more relevant comparison
it make sense to predict a crystal not the ASU
Amount of sidechains
relative surface accessibility (%)
0
2
4
6
8
10
12
14
ALL ARG ASN ASP CYS GLN GLU HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL
All residuesCrystal contacts
Extra percent in average accuracy
due to crystal awareness
Knowledge of crystal symmetry enables higher accuracy …
Tuning parameters of the Flexible Rotamer ModelTuning parameters of the Flexible Rotamer Model
staticE
due to backbone and frame
pairwiseE
is ijp
sample around rotamer
library’s conformation
due to sidechains’ interaction
,i i i iic
may be setup independently for each type of amino acid
static logTself probabilitykE E
from rotamer library ic, ,k T
search for optimal values in high-
dimensional space optimize one amino acid type in a time (and loop for all)
static1
1log
nT
i
is
E T en
pairwise 1 21 1
1log
n mT
i j
ij i jp s s
E T e s smn
Optimizing expensive function in multidimensional space
, expensivenx f x
|| ||arg max
x Rf x
1. Generate sample of arguments and evaluate function at these points
2. Assume that second orderapproximation works well
3. From the linear regression resolve coefficients and their covariance
4. Maximization of quadratic form is relatively simple, provided that we can resolve eigenvalues and eigenvectors
5. Hence, generate sample of quadratic forms, maximize each of them and aggregate
robust for non-convex functions!
86.9
87.4
87.9
88.4
88.9
89.4
0 10 20 30 40 50 60 70 80Iteration number
Ave
rage
con
ditio
nal a
ccur
acy
Testing @ H-bonds 1
Testing @ H-bonds 2
Training @ H-bonds 1
Traces through the optimization of the FRM parameters
Training on 40 proteins( ~ 2 500 residues )
+ 24 proteins more and continue ( ~ 5 000 residues )
Testing on 130 proteins ( ~ 20 000 residues )
Conditional accuracy (%)
Confidence of side-chain placement (derived from experimental EDS maps)
CYSASN ASPARG
ILE
GLN
GLU HIS
PHE
LEU
LYS
MET PRO SER THR
TRPALL
TYR VAL
sliding frame - 20%
1
3 1
2 1
2& &
&3
4
1
2
|
|
|
40
Measurement: Side chains with better electron density are easier to predict
Shapovalov et.al. 2007
backbone PDB file
SCWRL3.exe
rotamer library
output PDB file
functionality of SCWRL4 is available as library
enables direct manipulation
of the model via C++ API
class SCWRL{ …};
SCWRL4.DLL
all this good with Improved usability (coming soon)
Acknowledgements
Dr. Roland Dunbrack
Prof. Nickolai Kudryashov
Colleagues:
Adrian Canutescu
Guoli Wang
Maxim Shapovalov
Qiang Wang
Qifang Xu
Questions, Comments, Suggestions ?
Thanks for Your Attention! Have a Nice Day and welcome to our poster!