1 alignment of flexible protein structures based on: flexprot: alignment of flexible protein...

60
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre- definition of Hinge Regions / M. Shatsky, R. Nussinov, H. Wolfson Presented by: Einat Engel

Post on 21-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

1

Alignment of Flexible Protein Structures

Based on:FlexProt: Alignment of Flexible Protein Structures

Without a Pre-definition of Hinge Regions / M. Shatsky, R. Nussinov, H. Wolfson

Presented by: Einat Engel

Page 2: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

2

Introduction

Proteins are flexible structures

Page 3: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

3

Outline

IntroductionIntroduction

• Proteins (reminder)

• Protein motion

• Structural alignment – rigid & flexible

General Description:General Description:

• Problem’s description

• Discussion

Page 4: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

4

Outline

Detailed Description:Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithm for FPSA• ClusteringConclusions & Discussion:Conclusions & Discussion:• Summary of algorithm• Major results• Discussion

Page 5: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

5

Reminder: Protein Structure

Proteins are made up of 20 different amino acids (or "residues").

Different levels of protein structures:• Primary – amino acid sequence• Secondary – local folding of amino acid chains• Tertiary – 3D structure of a protein• Quaternary – forming multi-chained proteins

Page 6: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

6

Reminder: Protein Structure

Primary structure Tertiary structure

lysozyme

Page 7: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

7

Flexibility & Protein Motion

• Proteins are flexible molecules that undergo significant structural changes as part of their normal function.

• Motion often serves as an essential link between structure and function.

Page 8: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

8

Flexibility & Protein Motion

• Protein motions are involved in numerous basic functions. In fact, highly mobile proteins have been implicated in a number of diseases, e.g., the motion of gp41 in AIDS

Page 9: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

9

Structural Alignment

• When flexible molecules are compared to each other as rigid bodies, even strong similarities can be missed

• Yet, most existing protein alignment algorithms treat them as rigid objects

• We’ll see a technique for the alignment of flexible proteins

Page 10: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

10

The Goal

Go back

Page 11: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

11

Existing Approaches – Rigid Structural Alignment

• Exhaustive 3D search – search all possible rotations. (Matthews & Rossman)

• Fragment alignment – comparison of contiguous fragments.

• Geometric Hashing – Local reference frame, preprocessing & recognition (Fischer)

• Curve Matching – match curves using Fourier Transform (Schwartz & Sharir)

Page 12: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

12

Existing Approaches – Flexible Structural Alignment

• Domain detection – requires a-priori knowledge of the corresponding pairs of amino-acid residues (Wriggers & Schulten)

• Geometric hashing – requires a-priori knowledge of the hinge location (Verbitsky)

• Data base screening – requires a-priori knowledge of hinges (Rigoutsos)

Page 13: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

13

Outline

Introduction

• Proteins (reminder)

• Protein motion

• Structural alignment – rigid & flexible

General Description:General Description:

• Problem’s description

• Discussion

Page 14: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

14

Terminology

Two fragments are almost congruent (matched) if:

1. Their sequence length is the same.

2. There exists a 3D rotation and translation which superimposes the corresponding atoms with small RMSD.

(Reminder: RMSD measures alignment error.)

Page 15: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

15

Problem Definition

• Input: two protein molecules M1 and M2.

• Task: divide the two molecules into fragments of maximal size, such that the matched fragments will be almost congruent.

Page 16: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

16

Problem Discussion

• The regions between the fragments are called flexible (hinge) regions.

• We’d like to minimize the number of flexible regions and maximize the alignment size

• Our goal is to find a balanced solution

Conflict!

Example

Page 17: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

17

Problem Discussion

Consider two different solutions:

I. 3 rigid parts. Total size = 200 atoms

II. 2 rigid parts. Total size = 150 atoms

Q: Which is better?

A: I don’t know. Let’s divide the results according to the number of rigid parts.

Page 18: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

18

Major Results

• Introducing FlexProt, a new technique for the alignment of flexible proteins.

• Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites

• The pairs of rigid matching fragments and the flexible regions are detected simultaneously

Page 19: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

19

Outline

Detailed Description:Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithms for FPSA• ClusteringConclusions & Discussion:• Summary of algorithm• Major results• Discussion

Page 20: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

20

Flexible Protein Structural Alignment (FPSA)

Input

• Two proteins,

• Threshold error MaxRMSD

• MaxFlexNum parameter

• A weight function w

1 1 2 1

3

, , , , ,

,

n m

i j

M v v M u u

v u

Page 21: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

21

FPSA Problem Terminology

1 21 1 1 1

1 2

, , , , , , ,

,

k t k k k l t t t l

i j

F F l v v v u u u

v M u M

A rigid fragment pair is defined as:

and has the following property:

1 2T k tRMSD F F l MaxRMSD

Where TRMSD is defined as follows:

1 2

1 2 0min

l

t i k iiT k t

T

Tu vRMSD F F l

l

T is a 3D rigid transformation, meaning rotation and translation

Page 22: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

22

FPSA Problem Terminology

1, ,sJ sF f f Let be a list of rigid fragment pairs

1 2

i ii k t if F F l 1 1,i i i ik k t t , such that 1, , 1i s where

Let 1

11

,s

s

J i ii

W F w f f

w is a weight function that reflects the “goodness” of linking two rigid fragment pairs.

Page 23: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

23

The FPSA Problem

Example:

1 1

1 21 1k tf F F l

2 2

1 22 2k tf F F l

Page 24: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

24

The FPSA Problem

, 2, ,s s MaxFlexNum For Each

detect *

sJF such that:

* ,s s sJ J JW F W F F

Remember, is a list of rigid fragment pairs 1, ,sJ sF f f

Page 25: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

25

The FlexProt Algorithm for FPSA

I. Detection of all rigid fragment pairs,

that satisfy the MaxRMSD constraint

II. Detection of optimal configurations between

rigid fragment pairs, *

2s

MaxFlexNum

J sF

1 2k tF F l

Page 26: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

26

I. Detect all Rigid Fragment Pairs

1 2k tF F lIn order to find all possible pairs,

, ,k t lIterate over three indices where

1 ; 1 ; 3 min 1, 1k n t m l n k m t

and select the pairs satisfying

1 2T k tRMSD F F l MaxRMSD

, do:

Page 27: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

27

I. Complexity

1 2k tF F l

1 ; 1 ; 3 min 1, 1k n t m l n k m t

n mWe assume that

Remember, a rigid fragment pair -

, ,k t l• Iterate over

3O n

• Compute RMSD for each triplet – linear in the detected fragment size (Sharir)

O nTotal complexity - 4O n 3O n

Page 28: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

28

II. Detect Optimal Configuration

• Now, we have a set of congruent fragment pairs.

• Let’s find an optimal subset of it. This subset will describe an alignment of M2

with M1. We’ll use dynamic programming

Dynamic programming – solves optimization problem by caching subproblem solutions rather than recomputing them.

Dynamic programming – solves optimization problem by caching subproblem solutions rather than recomputing them.

Page 29: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

29

II. Detect Optimal Configuration

• In General: define a graph

• Vertices represent the rigid fragment pairs

• The directed edges represent flexible regions connecting the rigid fragment pairs

• A weight function w is applied to the edges. it reflects the goodness of connecting two rigidly matched fragment pairs

Page 30: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

30

II. Detect Optimal Configuration

Vertices 1 2i jV F F l

A directed edge between 1 2i jF F l 1 2

i jF F l and

is defined if:

1. The fragments are ascending ,i i j j

2. The gaps between consecutive fragments are limited by MaxGap1 and MaxGap2 (user defined)

1 1Gap i i l MaxGap 2 2Gap j j l MaxGap

Page 31: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

31

II. Detect Optimal Configuration

1 1 2 7,M v v v 2 1 2 8,M u u u

1 21 1 1 2 1 22 , , ,A F F v v u u

1 23 4 3 4 5 4 5 63 , , , , ,B F F v v v u u u

1 26 7 6 7 7 82 , , ,C F F v v u u

A

B

CDefine: MaxGap1=3MaxGap2=3

Page 32: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

32

II. Detect Optimal Configuration

2

1 2 1 21 max ,

A B C

w e l Gap Gap Gap Gap

The weight function (smaller is better):

Δ is half of the maximal overlapping interval

Part A rewards quadratically the size of 1 2i jF F l

Part B punishes large gaps

Part C punishes difference between Gap1 and Gap2

Page 33: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

33

II. Detect Optimal Configuration

A B

C

2

1 2 1 21 max ,

A B C

w e l Gap Gap Gap Gap

e1

1 21 1 1 2 1 22 , , ,A F F v v u u

1 23 4 3 4 5 4 5 63 , , , , ,B F F v v v u u u

1 26 7 6 7 7 82 , , ,C F F v v u u

2

1 3 0 max 0,1 0 1 7w e

2 16w e e2

Page 34: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

34

II. Detect Optimal Configuration

• We built a weighted directed acyclic graph (DAG)

• Shortest weighted paths correspond to alignments of consecutive, long, congruent matching fragments.

Almost Finished

Page 35: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

35

Reminder: Shortest Paths in DAGs

First, we perform a topological sort of the Directed Acyclic Graph (DAG).Then, we make just one pass over the vertices according to their order. For each vertex, we relax each edge that leaves it.

0 ∞ ∞∞ ∞2 7 -1 -2

6 1

42

0 2 6 6 42 6 55 33

Page 36: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

36

II. Detect Optimal Configuration

• We run the Shortest Paths in DAGs algorithm.• A simple case (no limit on the number of nodes

in the shortest path): The shortest path in G corresponds to a minimal weighted sequence of rigid fragment pairs, F**, such that

• Complexity -

**s sJ JW F W F F

O V E

Page 37: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

37

II. Detect Optimal Configuration

• We’ll make a small change in the algorithm since we need to detect shortest paths with exactly s nodes,

• In the simple case, each node holds a pointer to a preceding node with the shortest path.

• Instead, each node will hold MaxFlexNum pointers. Pointer s points to a preceding node with a shortest path of size s-1

2, ,s MaxFlexNum

Page 38: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

38

II. Complexity

During Relaxation, we check all MaxFlexNum possibilities and therefore the complexity is

V *MaxFlexNum E

• The number of nodes in the graph can be proportional to 3O n• Graph of n vertices has edges 2O n

Total complexity of stage II: 6O n

Page 39: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

39

Summary of FlexProt Algorithm

• Theoretical worst case complexity is

• In practice – FlexProt is highly efficient (with some changes)

• The average running time is approximately seven seconds (for molecules of 300 amino acids)

So… What does it look

like??

6O n

Page 40: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

40

Experimental ResultsExperimental Results

Page 41: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

41

Experimental Results

Page 42: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

42

Running FlexProt

• http://bioinfo3d.math.tau.ac.il/FlexProt

• http://www.umass.edu/microbio/chime/explorer/pe.htm

Page 43: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

43

Heuristic Improvement of Step I

• In step I, we detected all of the rigid fragment pairs. Time complexity –

• The procedure takes several minutes, even for small proteins.

• Instead, we can use a greedy algorithm, that only takes

3O n

2O n

Page 44: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

44

Heuristic Improvement of Step I

• Start by aligning a single matching atom pair where and

• Iteratively, add one matching atom pair to the left and one to the right.

• Stop when we exceed the RMSD threshold – when the list can’t be extended to the left or the right.

,a bv u 1av M 2bu M

Page 45: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

45

Heuristic Improvement of Step I

a

b

a+1

b+1

i+l-1

j+l-1a-1

b-1j

i

After the extension process, we have a match-list

1 1, ,, , , , ,i j i l j la bv u v uv u

1, ,j j lu u 1, ,i i lv v is almost congruent to

The next alignment is 1 1,i l j lv u

and not

1 1,a bv u

initiated at

Page 46: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

46

Complexity

Updating the RMSD at each step is 1O

Thus, finding a particular

1 2i jF F l

is linear in the length of the fragments -

O l

The time complexity, is:*1T

1 2

* 1 21

i j

i jF F l

T Time to compute F F l

Page 47: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

47

Complexity

1 2

* 2 31 , * *

i j

i jij

F F l

T O l v u O n n O n O n

Theoretically, some atom pairs , might participate in at most n fragment pairs.

,i jv u

In practice, a pair of atoms participates in at most 2 fragment pairs.

,i jv u

* 21T O n

There are O(n2) rigid fragment

pairs

Page 48: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

48

Clustering

• This stage can be viewed as an extension to the FPSA problem.

• The algorithm clusters consecutive fragment pairs, that have a similar 3D transformation, even if they are not directly linked.

Page 49: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

49

Clustering

Example: Two β-strands (A and B) are connected by loops of different lengths.

Stage I of the FlexProt algorithm aligns each separately.

A and B have almost the same 3D rigid transformation and in the clustering stage, they are joined into one structure.

Page 50: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

50

The Clustering Algorithm

• We take each path detected in stage II.

Remember, vertices = congruent fragment pair• The first vertex is a singleton cluster• Take the second vertex. Check if there is a rigid

transformation which superimposes both fragments.

If successful – do the same for the next vertexIf successful – do the same for the next vertex

Else – start a new cluster with the vertex that Else – start a new cluster with the vertex that failed to join the previous clusterfailed to join the previous cluster

Page 51: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

51

Clustering Complexity

• The number of iterations equals the number of rigid fragment pairs in the flexible alignment solution.

• Time complexity:

• is the number of flexible alignments. It is bounded by

3 * flexT O MaxNumFlex N

flexN 2*O MaxNumFlex n

2 23 *T O MaxNumFlex n

23T O n

n2 vertices

Page 52: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

52

Running FlexProt

• http://bioinfo3d.math.tau.ac.il/FlexProt

• http://www.umass.edu/microbio/chime/explorer/pe.htm

Page 53: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

53

Outline

Detailed Description:• FPSA problem description• FlexProt algorithm for the FPSA problem• Experimental results• Heuristic Algorithms for FPSA• ClusteringConclusions & Discussion:Conclusions & Discussion:• Summary of algorithm• Major results• Discussion

Page 54: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

54

Summary of Algorithm

Exact solution of FPSAExact solution of FPSAStep I (detection of fragment pairs)Step II (detection of optimal configuration)

Heuristic solutionHeuristic solutionStep IStep II

ClusteringClustering

O(n3)

O(n6)O(n2)

O(n4)

O(n2)

Page 55: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

55

Major Results

• Unlike other algorithms, FlexProt does not require a priori knowledge of the locations of the flexible, hinge-bending sites

• The pairs of rigid matching fragments and the flexible regions are detected simultaneously

• The speed of the method allows extensive database comparison.

Page 56: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

56

Significance of Results

• Proteins are flexible. They may appear in different conformations. FlexProt incorporates flexibility in structure comparison.

• Proteins function through binding. Flexibility is one of the characters of binding sites. So, it is important to detect hinge-bending sites.

Page 57: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

57

Significance of Results

• Comparing proteins despite the motion that they have undergone is helpful for protein classification

• These comparisons are also useful in drug design, detecting binding sites, and the range of motions that proteins display

Page 58: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

58

FlexProt – Discussion

• Differs from other flexible alignment algorithms (and of course, rigid)

• Does not violate the protein sequence order

• Given two alignments, each giving better results in different measures, which is better?

• Clustering is optional.• Which proteins are compared?

Page 60: 1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M

60

Bibliography

• Cormen, “Introduction to Algorithms”, chapter 24, Single Shortest Paths

• Gerstein, Database of molecular Movement• Shatsky, Nussinov and Wolfson, “Flexible Protein

Alignment and Hinge Detection”, Proteins: Structure, function and genetics: 48, 242-256 (2002)

• Wolfson, a “Structural Bioinformatics – 2003” presentation