1 alignment of flexible protein structures based on: flexprot: alignment of flexible protein...
Post on 21-Dec-2015
220 views
TRANSCRIPT
1
Alignment of Flexible Protein Structures
Based on:FlexProt: Alignment of Flexible Protein Structures
Without a Pre-definition of Hinge Regions / M. Shatsky, R. Nussinov, H. Wolfson
Presented by: Einat Engel
2
Introduction
Proteins are flexible structures
3
Outline
IntroductionIntroduction
• Proteins (reminder)
• Protein motion
• Structural alignment – rigid & flexible
General Description:General Description:
• Problem’s description
• Discussion
4
Outline
Detailed Description:Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithm for FPSA• ClusteringConclusions & Discussion:Conclusions & Discussion:• Summary of algorithm• Major results• Discussion
5
Reminder: Protein Structure
Proteins are made up of 20 different amino acids (or "residues").
Different levels of protein structures:• Primary – amino acid sequence• Secondary – local folding of amino acid chains• Tertiary – 3D structure of a protein• Quaternary – forming multi-chained proteins
6
Reminder: Protein Structure
Primary structure Tertiary structure
lysozyme
7
Flexibility & Protein Motion
• Proteins are flexible molecules that undergo significant structural changes as part of their normal function.
• Motion often serves as an essential link between structure and function.
8
Flexibility & Protein Motion
• Protein motions are involved in numerous basic functions. In fact, highly mobile proteins have been implicated in a number of diseases, e.g., the motion of gp41 in AIDS
9
Structural Alignment
• When flexible molecules are compared to each other as rigid bodies, even strong similarities can be missed
• Yet, most existing protein alignment algorithms treat them as rigid objects
• We’ll see a technique for the alignment of flexible proteins
10
The Goal
Go back
11
Existing Approaches – Rigid Structural Alignment
• Exhaustive 3D search – search all possible rotations. (Matthews & Rossman)
• Fragment alignment – comparison of contiguous fragments.
• Geometric Hashing – Local reference frame, preprocessing & recognition (Fischer)
• Curve Matching – match curves using Fourier Transform (Schwartz & Sharir)
12
Existing Approaches – Flexible Structural Alignment
• Domain detection – requires a-priori knowledge of the corresponding pairs of amino-acid residues (Wriggers & Schulten)
• Geometric hashing – requires a-priori knowledge of the hinge location (Verbitsky)
• Data base screening – requires a-priori knowledge of hinges (Rigoutsos)
13
Outline
Introduction
• Proteins (reminder)
• Protein motion
• Structural alignment – rigid & flexible
General Description:General Description:
• Problem’s description
• Discussion
14
Terminology
Two fragments are almost congruent (matched) if:
1. Their sequence length is the same.
2. There exists a 3D rotation and translation which superimposes the corresponding atoms with small RMSD.
(Reminder: RMSD measures alignment error.)
15
Problem Definition
• Input: two protein molecules M1 and M2.
• Task: divide the two molecules into fragments of maximal size, such that the matched fragments will be almost congruent.
16
Problem Discussion
• The regions between the fragments are called flexible (hinge) regions.
• We’d like to minimize the number of flexible regions and maximize the alignment size
• Our goal is to find a balanced solution
Conflict!
Example
17
Problem Discussion
Consider two different solutions:
I. 3 rigid parts. Total size = 200 atoms
II. 2 rigid parts. Total size = 150 atoms
Q: Which is better?
A: I don’t know. Let’s divide the results according to the number of rigid parts.
18
Major Results
• Introducing FlexProt, a new technique for the alignment of flexible proteins.
• Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites
• The pairs of rigid matching fragments and the flexible regions are detected simultaneously
19
Outline
Detailed Description:Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithms for FPSA• ClusteringConclusions & Discussion:• Summary of algorithm• Major results• Discussion
20
Flexible Protein Structural Alignment (FPSA)
Input
• Two proteins,
• Threshold error MaxRMSD
• MaxFlexNum parameter
• A weight function w
1 1 2 1
3
, , , , ,
,
n m
i j
M v v M u u
v u
21
FPSA Problem Terminology
1 21 1 1 1
1 2
, , , , , , ,
,
k t k k k l t t t l
i j
F F l v v v u u u
v M u M
A rigid fragment pair is defined as:
and has the following property:
1 2T k tRMSD F F l MaxRMSD
Where TRMSD is defined as follows:
1 2
1 2 0min
l
t i k iiT k t
T
Tu vRMSD F F l
l
T is a 3D rigid transformation, meaning rotation and translation
22
FPSA Problem Terminology
1, ,sJ sF f f Let be a list of rigid fragment pairs
1 2
i ii k t if F F l 1 1,i i i ik k t t , such that 1, , 1i s where
Let 1
11
,s
s
J i ii
W F w f f
w is a weight function that reflects the “goodness” of linking two rigid fragment pairs.
23
The FPSA Problem
Example:
1 1
1 21 1k tf F F l
2 2
1 22 2k tf F F l
24
The FPSA Problem
, 2, ,s s MaxFlexNum For Each
detect *
sJF such that:
* ,s s sJ J JW F W F F
Remember, is a list of rigid fragment pairs 1, ,sJ sF f f
25
The FlexProt Algorithm for FPSA
I. Detection of all rigid fragment pairs,
that satisfy the MaxRMSD constraint
II. Detection of optimal configurations between
rigid fragment pairs, *
2s
MaxFlexNum
J sF
1 2k tF F l
26
I. Detect all Rigid Fragment Pairs
1 2k tF F lIn order to find all possible pairs,
, ,k t lIterate over three indices where
1 ; 1 ; 3 min 1, 1k n t m l n k m t
and select the pairs satisfying
1 2T k tRMSD F F l MaxRMSD
, do:
27
I. Complexity
1 2k tF F l
1 ; 1 ; 3 min 1, 1k n t m l n k m t
n mWe assume that
Remember, a rigid fragment pair -
, ,k t l• Iterate over
3O n
• Compute RMSD for each triplet – linear in the detected fragment size (Sharir)
O nTotal complexity - 4O n 3O n
28
II. Detect Optimal Configuration
• Now, we have a set of congruent fragment pairs.
• Let’s find an optimal subset of it. This subset will describe an alignment of M2
with M1. We’ll use dynamic programming
Dynamic programming – solves optimization problem by caching subproblem solutions rather than recomputing them.
Dynamic programming – solves optimization problem by caching subproblem solutions rather than recomputing them.
29
II. Detect Optimal Configuration
• In General: define a graph
• Vertices represent the rigid fragment pairs
• The directed edges represent flexible regions connecting the rigid fragment pairs
• A weight function w is applied to the edges. it reflects the goodness of connecting two rigidly matched fragment pairs
30
II. Detect Optimal Configuration
Vertices 1 2i jV F F l
A directed edge between 1 2i jF F l 1 2
i jF F l and
is defined if:
1. The fragments are ascending ,i i j j
2. The gaps between consecutive fragments are limited by MaxGap1 and MaxGap2 (user defined)
1 1Gap i i l MaxGap 2 2Gap j j l MaxGap
31
II. Detect Optimal Configuration
1 1 2 7,M v v v 2 1 2 8,M u u u
1 21 1 1 2 1 22 , , ,A F F v v u u
1 23 4 3 4 5 4 5 63 , , , , ,B F F v v v u u u
1 26 7 6 7 7 82 , , ,C F F v v u u
A
B
CDefine: MaxGap1=3MaxGap2=3
32
II. Detect Optimal Configuration
2
1 2 1 21 max ,
A B C
w e l Gap Gap Gap Gap
The weight function (smaller is better):
Δ is half of the maximal overlapping interval
Part A rewards quadratically the size of 1 2i jF F l
Part B punishes large gaps
Part C punishes difference between Gap1 and Gap2
33
II. Detect Optimal Configuration
A B
C
2
1 2 1 21 max ,
A B C
w e l Gap Gap Gap Gap
e1
1 21 1 1 2 1 22 , , ,A F F v v u u
1 23 4 3 4 5 4 5 63 , , , , ,B F F v v v u u u
1 26 7 6 7 7 82 , , ,C F F v v u u
2
1 3 0 max 0,1 0 1 7w e
2 16w e e2
34
II. Detect Optimal Configuration
• We built a weighted directed acyclic graph (DAG)
• Shortest weighted paths correspond to alignments of consecutive, long, congruent matching fragments.
Almost Finished
35
Reminder: Shortest Paths in DAGs
First, we perform a topological sort of the Directed Acyclic Graph (DAG).Then, we make just one pass over the vertices according to their order. For each vertex, we relax each edge that leaves it.
0 ∞ ∞∞ ∞2 7 -1 -2
6 1
42
0 2 6 6 42 6 55 33
36
II. Detect Optimal Configuration
• We run the Shortest Paths in DAGs algorithm.• A simple case (no limit on the number of nodes
in the shortest path): The shortest path in G corresponds to a minimal weighted sequence of rigid fragment pairs, F**, such that
• Complexity -
**s sJ JW F W F F
O V E
37
II. Detect Optimal Configuration
• We’ll make a small change in the algorithm since we need to detect shortest paths with exactly s nodes,
• In the simple case, each node holds a pointer to a preceding node with the shortest path.
• Instead, each node will hold MaxFlexNum pointers. Pointer s points to a preceding node with a shortest path of size s-1
2, ,s MaxFlexNum
38
II. Complexity
During Relaxation, we check all MaxFlexNum possibilities and therefore the complexity is
V *MaxFlexNum E
• The number of nodes in the graph can be proportional to 3O n• Graph of n vertices has edges 2O n
Total complexity of stage II: 6O n
39
Summary of FlexProt Algorithm
• Theoretical worst case complexity is
• In practice – FlexProt is highly efficient (with some changes)
• The average running time is approximately seven seconds (for molecules of 300 amino acids)
So… What does it look
like??
6O n
40
Experimental ResultsExperimental Results
41
Experimental Results
42
Running FlexProt
• http://bioinfo3d.math.tau.ac.il/FlexProt
• http://www.umass.edu/microbio/chime/explorer/pe.htm
43
Heuristic Improvement of Step I
• In step I, we detected all of the rigid fragment pairs. Time complexity –
• The procedure takes several minutes, even for small proteins.
• Instead, we can use a greedy algorithm, that only takes
3O n
2O n
44
Heuristic Improvement of Step I
• Start by aligning a single matching atom pair where and
• Iteratively, add one matching atom pair to the left and one to the right.
• Stop when we exceed the RMSD threshold – when the list can’t be extended to the left or the right.
,a bv u 1av M 2bu M
45
Heuristic Improvement of Step I
a
b
a+1
b+1
i+l-1
j+l-1a-1
b-1j
i
After the extension process, we have a match-list
1 1, ,, , , , ,i j i l j la bv u v uv u
1, ,j j lu u 1, ,i i lv v is almost congruent to
The next alignment is 1 1,i l j lv u
and not
1 1,a bv u
initiated at
46
Complexity
Updating the RMSD at each step is 1O
Thus, finding a particular
1 2i jF F l
is linear in the length of the fragments -
O l
The time complexity, is:*1T
1 2
* 1 21
i j
i jF F l
T Time to compute F F l
47
Complexity
1 2
* 2 31 , * *
i j
i jij
F F l
T O l v u O n n O n O n
Theoretically, some atom pairs , might participate in at most n fragment pairs.
,i jv u
In practice, a pair of atoms participates in at most 2 fragment pairs.
,i jv u
* 21T O n
There are O(n2) rigid fragment
pairs
48
Clustering
• This stage can be viewed as an extension to the FPSA problem.
• The algorithm clusters consecutive fragment pairs, that have a similar 3D transformation, even if they are not directly linked.
49
Clustering
Example: Two β-strands (A and B) are connected by loops of different lengths.
Stage I of the FlexProt algorithm aligns each separately.
A and B have almost the same 3D rigid transformation and in the clustering stage, they are joined into one structure.
50
The Clustering Algorithm
• We take each path detected in stage II.
Remember, vertices = congruent fragment pair• The first vertex is a singleton cluster• Take the second vertex. Check if there is a rigid
transformation which superimposes both fragments.
If successful – do the same for the next vertexIf successful – do the same for the next vertex
Else – start a new cluster with the vertex that Else – start a new cluster with the vertex that failed to join the previous clusterfailed to join the previous cluster
51
Clustering Complexity
• The number of iterations equals the number of rigid fragment pairs in the flexible alignment solution.
• Time complexity:
• is the number of flexible alignments. It is bounded by
•
3 * flexT O MaxNumFlex N
flexN 2*O MaxNumFlex n
2 23 *T O MaxNumFlex n
23T O n
n2 vertices
52
Running FlexProt
• http://bioinfo3d.math.tau.ac.il/FlexProt
• http://www.umass.edu/microbio/chime/explorer/pe.htm
53
Outline
Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithms for FPSA• ClusteringConclusions & Discussion:Conclusions & Discussion:• Summary of algorithm• Major results• Discussion
54
Summary of Algorithm
Exact solution of FPSAExact solution of FPSAStep I (detection of fragment pairs)Step II (detection of optimal configuration)
Heuristic solutionHeuristic solutionStep IStep II
ClusteringClustering
O(n3)
O(n6)O(n2)
O(n4)
O(n2)
55
Major Results
• Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites
• The pairs of rigid matching fragments and the flexible regions are detected simultaneously
• The speed of the method allows extensive database comparison.
56
Significance of Results
• Proteins are flexible. They may appear in different conformations. FlexProt incorporates flexibility in structure comparison.
• Proteins function through binding. Flexibility is one of the characters of binding sites. So, it is important to detect hinge-bending sites.
57
Significance of Results
• Comparing proteins despite the motion that they have undergone is helpful for protein classification
• These comparisons are also useful in drug design, detecting binding sites, and the range of motions that proteins display
58
FlexProt – Discussion
• Differs from other flexible alignment algorithms (and of course, rigid)
• Does not violate the protein sequence order
• Given two alignments, each giving better results in different measures, which is better?
• Clustering is optional.• Which proteins are compared?
59
Websites
PDB – http://www.rcsb.org/pdb/
SCOP –http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bbh.b.b.be.html
Database of motions –http://molmovdb.mbb.yale.edu/molmovdb/
60
Bibliography
• Cormen, “Introduction to Algorithms”, chapter 24, Single Shortest Paths
• Gerstein, Database of molecular Movement• Shatsky, Nussinov and Wolfson, “Flexible Protein
Alignment and Hinge Detection”, Proteins: Structure, function and genetics: 48, 242-256 (2002)
• Wolfson, a “Structural Bioinformatics – 2003” presentation