pharm 201 lecture 10, 20091 reductionism and classification require detailed comparison consider 3d...

41
Pharm 201 Lecture 10, 2009 1 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Reading Chapter 16, Structural Bioinformatics

Upload: sheila-owen

Post on 05-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 1

Reductionism and Classification Require Detailed Comparison

Consider 3D Comparison

Pharm 201/Bioinformatics I

Philip E. BourneDepartment of Pharmacology, UCSD

Reading Chapter 16, Structural Bioinformatics

Page 2: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 2

Consider this Course a Workflow

Data In Understand the scopeand complexity of the

data

Understand theexperiment to understand the

errors

Understand howto best represent(model) the data

Understand the methods to

physically instantiatethe model

From initialanalysis understand

how to controldata in

Recognize redundancyIn the data

Classify the data Visualize the data

Analyze the data

Page 3: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 3

From Last Time

• We established the complex relationship between:– Sequence and Structure– Structure and Structure– Structure and Function

• Today we analyze how the relationships between structure and structure are established

Page 4: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 4

Agenda

• Understand why structure comparison is important

• Understand why it is not a solved problem• Understand the basics of the methods used to

address the problem• Understand one method (CE) in more detail• Review an example where structure comparison

has revealed new biological insights

Page 5: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 5

Why Structure Comparison is Important

• Reductionism – needed to classify protein structures

• Functional assignment and hopefully new biology

• Alignment of predicted structure against structural templates

• Establish improved sequence relationships not possible from sequence alone

• Protein engineering

Page 6: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 6

Distinctions - Structure Superposition and Structure Comparison and

Alignment are Different

• Structure superposition assumes you already know which atoms to superimpose – it merely optimizes for the atoms chosen (relatively simple)

• Structure alignment must first determine what atoms to align (difficult). We are concerned with alignment

Page 7: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 7

Distinctions – Pair-wise Alignments are Different from Multiple Structure

Alignments• Multiple structure alignment algorithms are

rare and of questionable quality (see for example Nucleic Acids Research (2004), 32 W100-W103

• Multiple structure alignments should not be confused with multiple pair-wise alignments

• Here we focus on single pair-wise comparison and alignment

Page 8: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 8

Why is it Not a Solved Problem?

Page 9: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 9

Current State of the Art

• There are many papers published on this, but relatively few have code to download or Web sites from which to perform comparisons

• All methods can identify obvious similarities between two structures

• Remote similarities are detected by a subset of methods – different remote similarities are recognized by different methods

• Good alignments are much harder to come by• Speed is a serious issue with some algorithms

Page 10: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 10

Desirables

• Biologically meaningful alignments not just geometrically meaningful

• Complete database of all alignments

• Ability to apply to structures not in the PDB

Page 11: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 11

Maintain 9 of 10 interactionsRMSD 1.5 Å

Maintain 5 of 10 interactionsRMSD 0.5 Å

Biological vs Geometric Alignments Plastocyanin versus Azurin (from Godzik 1996)

Page 12: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 12

Literature Alignments - Flavodoxin vs Che Y ProteinFrom Godzik (1996) Protein Science, 5, 1325-1338.

Page 13: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 13

Understand the basics of the methods used to address the

problem

Page 14: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 200914

See also http://en.wikipedia.org/wiki/Structural_alignment_software

Page 15: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 15

How to Compare Structures?

Structure 1 Structure 2

Feature extraction

Structure description 1 Structure description 2

Comparison algorithm

Scores

Statistical significance

Similarity, classification

1.

2.

3.

Page 16: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 16

Components of Structure Alignment1. Structure Description

• Local geometry

• Side chain contacts

• Geometric hashing

• Distance matrix (Dali, 1993)

• Properties (secondary structure, hydrophobic clusters (Comparer, 1990)

• Secondary structure elements (VAST, 1996)

• Distances of inter & intra aligned fragment pairs (CE, 1998)

• Contact map (Celera, 2004)

• Geometry invariants (Jia et al, 2004)

Page 17: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 17

Components of Structure Alignment

2. Alignment algorithms– Monte Carlo (Dali, VAST)– Heuristics (CE)– Dynamic Programming (CE)– Probabilistic

3. Statistical significance

Page 18: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Components of Structure Alignment2. Alignment algorithms

Dynamic programming, Integer programming, Monte Carlo…

3. Statistical significanceLevitt and Gerstein, PNAS, 1998

Random Model and CE scoring function (Jia et al, 2004)

Input & output of alignment algorithm

Input: two proteins: and

Output: An alignment

and scores

Constraints:

min rmsd:

max L

min Gaps:

},,{ 1 maaA },,{ 1 nbbB

LL

jiji

jjjiii

babaBALLL

2121 ,

)},,(,),,{(),(11

L

Tbarmsd

L

kji

T

kk

1

2)(min

1

111 11

L

ttttt jjiiGaps

18

Page 19: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 19

Understand one method (CE) in more detail

I.N. Shindyalov and P.E. Bourne Protein Engineering 1998, 11(9) 739-747. Protein

Structure Alignment by Incremental Combinatorial Extension of the Optimum

Path. [PDF File] 793 citations!

Page 20: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 20

Basic Approach

• Compare octameric fragments – an aligned fragment pair (AFP) (local alignments)

• Stitch together AFPs• Find the optimal path through the AFPs• Optimize the alignment through dynamic

programming• Measure the statistical significance of the

alignment

Page 21: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 21

Why This Approach?Alignment Space is Very Large and Must be Constrained Without Loosing

Meaningful Alignments

Similarity Matrix S where:

S=(nA-m).(nB-m)

m = Length of AFP

nA = Length of protein A

This is very large to compute – constraints are needed

Page 22: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 22

Calculation of distance: (a) Dij for alignment represented by two AFPs i and j from the

path; (b) Dii for single AFP i from the path.

Page 23: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 23

Definition of the Alignment Path

pAi = AFPs starting residue position in protein A at the ith position

of the alignment pathm = longest continual path – set as 8One of the conditions (1)-(3) should be satisfied for 2 consecutive AFPs i and i+1 in the path (1) = 2 consecutive AFPs aligned without gaps(2) = Two consecutive AFPs with a gap in protein A(3) = Two consecutive AFPs with a gap in protein B

Page 24: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 24

Extension of the Alignment Path

Gap sizes are limited to G – heuristically set as 30 residues

Page 25: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 25

1. Distance calculated from independent set of inter-residue distances where each distance is used only once - used for combinations of 2 AFPs

2. Full set of inter-residue distances - used for a single AFP

3. RMSD from least squares superposition - used to select few best fragments

Evaluation based upon the following three distance similarity measures

Page 26: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 26

1. Distance calculated from independent set of inter-residue distances where each distance is used only once

2. Full set of inter-residue distances

3. RMSD from least squares superposition

Evaluation Based Upon the Following Three Distance Similarity Measures

Page 27: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 27

How to Extend the Path?

1. Consider all possible AFPs that extend the path

2. Consider only the best AFP

3. Use some intermediate strategy

Page 28: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 28

How to Extend the Path?

1. Consider all possible AFPs that extend the path Computationally expensive

2. Consider only the best AFP Works well with the right heuristics

3. Use some intermediate strategy

Page 29: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 29

What Heuristics?

Candidate AFPs are based upon (9) D0 = 3ÅThe best AFP is based upon (10) D1 = 4ÅThe decision to extend or terminate the path is based upon (11)

Page 30: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 30

Z-Score

• Evaluate the probability of finding an alignment path of the same length or smaller gaps and distance from a random set of non-redundant structures

Page 31: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 31

The 20 best alignments with a Z score above 3.5 are assessed based on RMSD and the best kept. This produces approx. one error in 1000 structures

Each gap in this alignment is assessed for relocation up to m/2

Iterative optimization using dynamic programming is performedusing residues for the superimposed structures

Optimization of the Final Path

Page 32: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 32

Test Case: Phycocyanin versus Colicin A

Page 33: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 33

Cyclin-dependent kinasesOpen (purple) Closed (blue)Pavelitch et al. (1997)

Page 34: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 34

Limitations

• Will not find non-topological alignments (outside the bounds of the dotted lines)

• What are the correct “units” to be comparing?

• CE works on chains – as we shall see in future weeks domains are the correct units, but definition of the domains is not straightforward

Page 35: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 35

• Took 11,748 chain in the PDB (1/98)

• Computed for 1868 representatives

• 24,000 Cray T3E processor hours

• Loaded pairwise alignments into database

Computation of All x All

Page 36: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 36

1-2 Years Ago

• 40,000 proteins ~ 70,000 chains• 70,0002/2 * 30 seconds = 2330 yrs• Options:

– Use a redundant set of chains– Use parallel architectures

D. Pekurovsky, I.N. Shindyalov, P.E. Bourne 2004 High Throughput Biological Data Processing on Massively Parallel Computers. A Case Study of Pairwise Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm.

Bioinformatics, 20(12) 1940-1947 [PDF].

Page 37: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Now

• Using egrid to compute all by all for CE and FatCat

Pharm 201 Lecture 10, 2009 37

Page 38: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 38

One Criteria for Redundancy

• Remove highly homologous chains;• The  RMSD between two chains is less than 2Å;• The length difference between two chains is less

than 10%; • The number of gap positions in alignment

between two chains is less than 20% of aligned residue positions;

• At least 2/3 of the residue positions in the represented chain are aligned with the representing chain.

Page 39: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 39

Review example where structure comparison has revealed new

biological insights

Page 40: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 40

Example

• CE revealed putative Ca++ binding domain in acetylcholinesterase

• Sequence similarity to neuroligins predicts Ca++ binding too – confirmed experimentally

• Members of the a/b hydrolase family bind Ca++ which may be important for heterologous cell associations

Structural similarity between Acetylcholinesterase and Calmodulin found using CE (Tsigelny et al, Prot Sci, 2000, 9:180)

Page 41: Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne

Pharm 201 Lecture 10, 2009 41

The Future(also a general rule)

• Gold standards are important

• For structure comparison a human generated alignment standard is important

• Algorithms are then challenged to meet the standard

• Eventually those algorithms highlight problems with the standard

• The cycle continues