protein folds and protein structure superposition · tasks l compare all known structures to each...
TRANSCRIPT
![Page 1: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/1.jpg)
Protein Folds and ProteinStructure Superposition
Amit P. Singh
Biochemistry 218/MIS 231
November 24, 1999
![Page 2: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/2.jpg)
PDB Size
l More than 11,000 released atomic coordinateentries
l About 4.5 new entries per day
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1970 1980 1990 2000
Year
Nu
mb
er o
f E
ntr
ies
![Page 3: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/3.jpg)
Tasksl Compare all known structures to each other
l Classify and organize all structures in a meaningfulway
l Find common folding patterns and structural motifs
l Compute evolutionary distances between proteinstructures
l Study interactions between structures and othermolecules (Protein Docking)
l Use known structures to predict structure fromsequence (Protein Threading)
l Many more ...
![Page 4: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/4.jpg)
Overview
l Protein folds• Classification of protein structures
• Current databases
l Structure Superposition• Existing methods
• LOCK - hierarchical protein structure superposition
• 3dSearch - fast secondary structure superposition
l Demonstrations
![Page 5: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/5.jpg)
Classification of Protein Structuresl Class
• Similar secondary structure content• All α, all β, α/β, etc
l Fold (Architecture)• Major structural similarity
• SSE’s in similar arrangement
l Superfamily (Topology)• Probable common ancestry
l Family• Clear evolutionary relationship
• Sequence similarity usually > 25%
![Page 6: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/6.jpg)
Classes of Protein Structuresl Mainly αl Mainly βl α/β
• Parallel β sheets, β-α-β units
l α+β• Anti-parallel β sheets, segregated α
and β regions
• helices mostly on one side of sheet
![Page 7: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/7.jpg)
Classes of Protein Structures
l Others• Multi-domain, membrane and cell surface, small
proteins, peptides and fragments, designedproteins
![Page 8: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/8.jpg)
Folds / Architecturesl Mainly α
• Bundle
• Non-Bundle
l Mainly β• Single sheet
• Roll
• Barrel
• Clam
• Sandwich
• Prism
• 4/6/7/8 Propeller
• Solenoid
l α/β and α+β• Closed
» Barrel
» Roll, ...
• Open
» Sandwich
» Clam, ...
![Page 9: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/9.jpg)
eg. The TIM Barrel Fold
![Page 10: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/10.jpg)
A Conceptual Problem ...
![Page 11: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/11.jpg)
For example:
Another example:Globin
vsColicin
![Page 12: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/12.jpg)
Databases of Foldsl SCOP
• Structural Classification of Protein Structures
• Manual assembly by inspection
• All nodes are annotated (eg. All-alpha, alpha/beta)
• Structural similarity search using 3dSearch (Singh and Brutlag)
l CATH• Class - Architecture - Topology - Homologous Superfamily
• Manual classification at Architecture level
• Automated topology classification using the SSAP algorithm(Orengo and Taylor)
• No structural similarity search
![Page 13: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/13.jpg)
l FSSP• Fully automated using the DALI algorithm (Holm and
Sander)
• No internal node annotations
• Structural similarity search using DALI
l PClass• Fully automated using the LOCK and 3dSearch algorithms
(Singh and Brutlag)
• All internal nodes automatically annotated with commonterms
• JAVA based classification browser
• Structural similarity search using 3dSearch
Databases of Folds
![Page 14: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/14.jpg)
Automating Structure Classification andFold Detection
l Growth of PDB demands automated techniques forclassification and fold detection
l Protein Structure Comparison• computing structure based evolutionary distances
• identifying protein function
• understanding functional mechanism
• identifying structurally conserved regions in the protein
• finding binding sites or other functionally important regionsof the protein
![Page 15: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/15.jpg)
Structure Superposition
l Find the transformation matrix that best overlaps the table andthe chair
l i.e. Find the transformation matrix that minimizes the rootmean square deviation between corresponding points of thetable and the chair
l Correspondences:
• Top of chair to top of table
• Front of chair to front of table, etc.
![Page 16: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/16.jpg)
Absolute Orientation Algorithm
+Closed-form solution of absoluteorientation using unitquaternions
Berthold K.P. Horn,J.Opt.Soc.Am,April 1987, Vol 4, No. 4
The key is finding corresponding points between the twostructures
![Page 17: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/17.jpg)
Algorithms for Structure Superpositionl Distance based methods:
l DALI (Holm and Sander): Aligning scalar distance plots
l STRUCTAL (Gerstein and Levitt): Dynamic programming using pairwise inter-molecular distances
l SSAP (Orengo and Taylor): Dynamic programming using intra-molecular vector distances
l MINAREA (Falicov and Cohen): Minimizing soap-bubble surfacearea
l Vector based methods:l VAST (Bryant): Graph theory based secondary structure alignment
l 3dSearch (Singh and Brutlag) & 3D Lookup (Holm andSander): Fast secondary structure index lookup
l Bothl LOCK (Singh and Brutlag): Hierarchically uses both secondary
structure vectors and atomic distances
![Page 18: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/18.jpg)
DALI
An intra-molecular distance plot for myoglobin
![Page 19: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/19.jpg)
DALI
l Based on aligning 2-D intra-molecular distancematrices
l Computes the best subset of corresponding residuesfrom the two proteins such that the similarity betweenthe 2-D distance matrices is maximized
l Searches through all possible alignments of residuesusing Monte-Carlo and Branch-and-Bound algorithms
Score(i, j) = 1.5 - |distanceA(i, j) - distanceB(i, j)|
![Page 20: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/20.jpg)
STRUCTALl Based on Iterative Dynamic Programming to align
inter-molecular distances
l Pair-wise alignment score in each square of the matrixis inversely proportional to distance between the twoatoms
![Page 21: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/21.jpg)
VAST - Vector Alignment Search Tool
l Aligns only secondary structure elements (SSE)
l Represents each SSE as a vector
l Finds all possible pairs of vectors from the twostructures that are similar
l Uses a graph theory algorithm to find maximalsubset of similar vectors
l Overall alignment score is based on the number ofsimilar pairs of vectors between the two structures
![Page 22: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/22.jpg)
Algorithms for Structure Superpositionl Distance based methods:
l DALI (Holm and Sander): Aligning scalar distance plots
l STRUCTAL (Gerstein and Levitt): Dynamic programming using pair wiseinter-molecular distances
l SSAP (Orengo and Taylor): Dynamic programming using intra-molecularvector distances
l MINAREA (Falicov and Cohen): Minimizing soap-bubble surface area
l Vector based methods:l VAST (Bryant): Graph theory based secondary structure alignment
l 3dSearch (Singh and Brutlag): Fast secondary structure index lookup
l Bothl LOCK (Singh and Brutlag): Hierarchically uses both secondary
structure vectors and atomic distances
![Page 23: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/23.jpg)
LOCK - Creating Secondary Structure Vectors
![Page 24: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/24.jpg)
Comparing Secondary Structure Vectors
i
p
jk
r
q
S(d) =
M
d
d0
1
2
+
-2M M
d
M-
d0
Orientation Independent Scores:S = S(|angle(i,k) - angle(p,r)|)S = S(|distance(i,k) - distance(p,r)|)S = S(|length(k) - length(r)|)
Orientation Dependent Scores:S = S(angle(k,r))S = S(distance(k,r))
![Page 25: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/25.jpg)
Aligning Secondary Structure Vectors
Best local alignment : HHSSSHSSH
H H S S
H
HS
SS
![Page 26: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/26.jpg)
Three Step Algorithm
l Local Secondary Structure Superposition• Find an initial superposition of the two proteins by using
dynamic programming to align the secondary structurevectors
l Atomic Superposition• Apply a greedy nearest neighbor method to minimize the
RMSD between the C-α atoms from query and the target(i.e. find the nearest local minimum in the alignment space)
l Core Superposition• Find the best sequential core of aligned C-α atoms and
minimize the RMSD between them
![Page 27: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/27.jpg)
Step 1: Local Secondary Structure Superposition
H1H3
S2S4
S2H1
H3S4
![Page 28: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/28.jpg)
Step 1: Local Secondary Structure Superposition
A1A3
A2A4
# of aligned vectors
A1,A2 B2,B3 A3,A4B3,B4
total alignment scorepair
B2B1
B3B4
3 71
2 32
![Page 29: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/29.jpg)
Step 1: Local Secondary Structure Superposition
![Page 30: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/30.jpg)
Step 2: Atomic Superposition
![Page 31: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/31.jpg)
Step 3: Core Superposition
![Page 32: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/32.jpg)
Features
l Able to detect both global structural similarity as wellas sub-domain similarity
l Fast: O(n3), where n = number of secondary structureelements
l ~4 min to compare myoglobin (153 residues) to 800representative proteins from the PDB (on a MIPSR10000, 195 MHz processor)
l Can align flexible proteins that contain multiple sub-domains
![Page 33: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/33.jpg)
LOCK PDB Search: Myoglobin1myh-A 0.562531 153 MYOGLOBIN (AQUOMET, PH 7.1) MUTANT2dhb-A 1.335804 132 HEMOGLOBIN (HORSE,DEOXY)1eca 1.497830 130 HEMOGLOBIN (ERYTHROCRUORIN, AQUO MET)1flp 1.349387 129 HEMOGLOBIN I (MONOMERIC) (FERRIC)2lhb 1.084818 128 HEMOGLOBIN V (CYANO,MET)1hds-B 1.235697 128 HEMOGLOBIN (SICKLE CELL)1ith-A 1.302358 126 HEMOGLOBIN (CYANOMET)1mba 1.501337 126 MYOGLOBIN (MET) ($P*H 7.0)1hbg 1.412812 125 HEMOGLOBIN (CARBON MONOXY)1ash 1.431599 122 HEMOGLOBIN (DOMAIN ONE)1hlb 1.475264 121 HEMOGLOBIN (SEA CUCUMBER)1hbi-A 1.408008 118 HEMOGLOBIN I (OXYGENATED, HOMODIMER)1gdi 1.476536 118 LEGHEMOGLOBIN (CARBON MONOXY)1cpc-A 1.787059 80 C-PHYCOCYANIN1cpc-L 1.682854 74 C-PHYCOCYANIN1tox-A 1.640087 63 DIPHTHERIA TOXIN DIMER COMPLEXED WITHNAD1pbg-A 1.701624 60 MOL_ID: 1;1col-A 1.798676 58 COLICIN *A (C-TERMINAL DOMAIN)2sbl-B 1.519786 53 LIPOXYGENASE-1 (SOYBEAN)1oxa 1.851156 52 CYTOCHROME P450 (DONOR:O2OXIDOREDUCTASE)1krb-C 1.680887 49 MOL_ID: 1;1irk 1.409630 48 INSULIN RECEPTOR (TYROSINE KINASEDOMAIN)2hpd-A 1.619509 48 CYTOCHROME P450 (BM-3)
![Page 34: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/34.jpg)
1lmb-4 0.207748 22 DNA-BINDING REGULATORY PROTEIN1pra 0.451552 22 GENE REGULATING PROTEIN1adr 0.472226 22 TRANSCRIPTION REGULATION1yrn-A 0.492206 22 COMPLEX (TWO DNA-BINDING PROTEINS/DNA)1dik 0.517499 22 PHOSPHOTRANSFERASE1pnr-A 0.613499 22 COMPLEX (DNA-BINDING REGULATION/DNA)4fis-A 0.677586 22 DNA-BINDING PROTEIN1oct-C 0.697683 22 DNA-BINDING PROTEIN1ftt 0.742871 22 DNA BINDING PROTEIN1cop-E 0.768256 22 GENE REGULATING PROTEIN1dtr 0.780090 22 DNA BINDING PROTEIN1gdt-A 0.795171 22 COMPLEX (SITE-SPECIFICRECOMBINASE/DNA)3gap-B 0.817812 22 GENE REGULATORY PROTEIN1lfb 0.820489 22 TRANSCRIPTION REGULATION1ads 0.920977 22 OXIDOREDUCTASE1pdn-C 0.949104 22 COMPLEX (GENE REGULATING PROTEIN/DNA)1hom 0.958631 22 DNA-BINDING PROTEIN1mse-C 0.965817 22 COMPLEX (BINDING PROTEIN/DNA)1cma-A 1.023169 22 DNA-BINDING REGULATORY PROTEIN1trr-A 1.048990 22 DNA-BINDING REGULATORY PROTEIN1tpl-B 1.467803 22 LYASE(CARBON-CARBON)1hcr-A 0.930009 21 DNA-BINDING1neq 0.949148 21 DNA-BINDING PROTEIN1ade-A 0.979226 21 LIGASE (SYNTHETASE)153l 1.093427 21 HYDROLASE(O-GLYCOSYL)
LOCK PDB Search: Helix-Turn-Helix
![Page 35: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/35.jpg)
3dSearchl Place vectors of all target proteins into an index table
l Structure comparison = Index lookup
l Building the table:• For each pair of vectors in the target protein, generate a
unique coordinate system
• Record the position and orientation of all remaining vectors inthis coordinate system in an index table
l Searching the table:• For each pair of vectors in the query protein, generate a
unique coordinate system
• Search the table of target vectors to find the query coordinatesystem that results in the maximum number of aligned vectors
![Page 36: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/36.jpg)
l For each pair of vectors (Vi,Vj)• Transform the coordinates such that Vi lies on the Z-axis
and the projection of Vj on to the X-Y plane is parallel to theY-axis
Generating a Coordinate System
![Page 37: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/37.jpg)
Building the Index Table
l For each coordinate system• Compute the position and orientation of the remaining
vectors in this new frame of reference
![Page 38: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/38.jpg)
Algorithms for Structure Superpositionl Distance based methods:
l DALI (Holm and Sander): Aligning scalar distance plots
l STRUCTAL (Gerstein and Levitt): Dynamic programming using pair wiseinter-molecular distances
l SSAP (Orengo and Taylor): Dynamic programming using intra-molecularvector distances
l MINAREA (Falicov and Cohen): Minimizing soap-bubble surface area
l Vector based methods:l VAST (Bryant): Graph theory based secondary structure alignment
l 3dSearch (Singh and Brutlag): Fast secondary structure indexlookup
l Bothl LOCK (Singh and Brutlag): Hierarchically uses both secondary structure
vectors and atomic distances
![Page 39: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/39.jpg)
B
0
10
20
30
40
50
60
0 10 20 30 40 50 60
Number of False Positives
Number of True Posit ives DALI
STRUCTALVASTMINAREALOCK3dSEARCH
C
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60
Number of False Positives
Number of True Posi t ives DALI
STRUCTALVASTMINAREALOCK3dSEARCH
Comparing Structural Alignment Algorithms
TIM Barrel
Immunoglobulin
![Page 40: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/40.jpg)
αα/β & α+β
α+β β
Subset ofPClass
![Page 41: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/41.jpg)
Comparing PClass to Scop: NumClusters
A
B C
l Clusters (S) = number of PClass “Clusters” required to cover all leaves ofS
l Given a set of leaves L, a PClass node, P, is a “Cluster” if:
• all leaves of P are in the set L (i.e. P is a subset of L)
• no ancestor of P is also a “Cluster”
![Page 42: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/42.jpg)
Comparing Pclass to Scop: NumClusters
A
B C
Clusters(A) = 3
Clusters(B) = 2
Clusters(C) = 1
![Page 43: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/43.jpg)
l PClassDistance(i,j,S) :• Find lowest common ancestor, P, of i and j
• Computer the number of leaves P that are not leaves of S (i.e. False Positives)
A
B C
Sunder j)(i, pairs ofnumber
S) j,ance(i,PClassDist
)S(Spread Sunder j)(i, leaves of pairs all�
=
Comparing PClass to Scop: Spread
![Page 44: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/44.jpg)
Comparing PClass to Scop: Spread
A
B C
FP(A) = 1
FP(B) = 2
FP(C) = 1
Spread(A) = (2*4*1)/(6C2) = 8/(6*5) =0.27
Spread(C) = 0
Spread(B) = (1*4*2)/(5C2) = 8/(5*4) =0.4
![Page 45: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/45.jpg)
Family Sizes
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7 8 9 10 11
SuperFamily Sizes
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 14 16 17 19 21 38
Fold Sizes
0
20
40
60
80
100
120
140
160
180
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 21 51 58
Class Sizes
0
0.2
0.4
0.6
0.8
1
1.2
1 17 28 52 148 183 249 268
![Page 46: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/46.jpg)
Class NumClusters
0
10
20
30
40
50
60
70
80
1 17 28 52 148 183 249 268
Fold NumClusters
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 21 51 58
SuperFamily NumClusters
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 14 16 17 19 21 38
Family NumClusters
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11
![Page 47: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/47.jpg)
Family ClusterCoverage
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11
SuperFamily ClusterCoverage
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11 12 14 16 17 19 21 38
Fold ClusterCoverage
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 21 51 58
Class ClusterCoverage
0
50
100
150
200
250
1 17 28 52 148 183 249 268
![Page 48: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/48.jpg)
Class Spread
0
100
200
300
400
500
600
700
1 17 28 52 148 183 249 268
Fold Spread
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 21 51 58
SuperFamily Spread
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10 11 12 14 16 17 19 21 38
Family Spread
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10 11
![Page 49: Protein Folds and Protein Structure Superposition · Tasks l Compare all known structures to each other l Classify and organize all structures in a meaningful way l Find common folding](https://reader033.vdocuments.mx/reader033/viewer/2022052002/60150c3ec1c1b93a4c3f446a/html5/thumbnails/49.jpg)
Demonstrations
l Fold Databases• SCOP (http://scop.stanford.edu/scop)
• FSSP (http://www2.ebi.ac.uk/dali/fssp/fssp.html)
• PClass (http://gene.stanford.edu/PClass)
l Structural Alignment Tools• LOCK (http://gene.stanford.edu/lock)
• 3dSearch (http://gene.stanford.edu/3dSearch/)
• DALI (http://www2.ebi.ac.uk/dali)