protein tertiary structure prediction
DESCRIPTION
Protein Tertiary Structure Prediction. Structural Bioinformatics. The Different levels of Protein Structure. Primary: amino acid linear sequence. Secondary: -helices, β -sheets and loops. Tertiary : the 3D shape of the fully folded polypeptide chain. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/1.jpg)
Protein Tertiary Structure Prediction
Structural Bioinformatics
![Page 2: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/2.jpg)
Primary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded polypeptide chain
The Different levels of Protein Structure
![Page 3: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/3.jpg)
How can we view the protein structure ?
• Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/
• Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.sourceforge.net/
• Upload the coordinates to the viewer
![Page 4: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/4.jpg)
Pymol example• Launch Pymol• Open file “1aqb” (PDB coordinate file)• Display sequence• Hide everything• Show main chain / hide main chain• Show cartoon • Color by ss• Color red• Color green, resi 1:40
Help http://pymol.sourceforge.net/newman/user/toc.html
![Page 5: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/5.jpg)
Predicting 3D Structure
– Comparative modeling (homology)
– Fold recognition (threading)
Outstanding difficult problem
![Page 6: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/6.jpg)
Comparative ModelingSimilar sequences suggests similar structure
![Page 7: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/7.jpg)
Sequence and Structure alignments of two Retinol Binding Protein
![Page 8: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/8.jpg)
Structure Alignments
The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another.
Low values of RMSD mean similar structures
There are many different algorithms for structural Alignment.
![Page 9: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/9.jpg)
Dali (Distance mAtrix aLIgnment)
DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues.
See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138.
SALIGN http://salilab.org/DBALI/?page=tools
![Page 10: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/10.jpg)
Fold classification based on structure-structurealignment of proteins (FSSP)
Page 293
FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity.
http://www.ebi.ac.uk/dali/fssp
![Page 11: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/11.jpg)
![Page 12: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/12.jpg)
Comparative Modeling
Comparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase
Similar sequence suggests similar structure
![Page 13: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/13.jpg)
Comparative Modeling• Accuracy of the comparative model is
related to the sequence identity on which it is based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% modeled
<30% sequence identity =low accuracy (many errors)
![Page 14: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/14.jpg)
Homology Threshold for Different Alignment Lengths
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100
Alignment length (L)
Homology Threshold (t)
A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L.
The threshold values t(L) are derived from PDB
![Page 15: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/15.jpg)
Comparative Modeling
• Similarity particularly high in core– Alpha helices and beta sheets preserved– Even near-identical sequences vary in loops
![Page 16: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/16.jpg)
Comparative Modeling Methods
MODELLER (Sali –Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html
![Page 17: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/17.jpg)
Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
![Page 18: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/18.jpg)
Building and assessing a model
• Superimposing the related 3D-structures. • Rebuilding the lacking Loops. • Completing and correct Backbone. • Correcting and rebuilding Side Chains. • Verifying the structure quality of the model• Refining the structure by Energy minimization
and Molecular Dynamics (finding the most stable conformation).
![Page 19: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/19.jpg)
Fold Recognition
![Page 20: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/20.jpg)
Protein Folds
• A combination of secondary structural units– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
![Page 21: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/21.jpg)
Hemoglobin TIM
Protein Folds: sequential and spatial arrangement of secondary structures
![Page 22: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/22.jpg)
Protein Folds
• A combination of secondary structural units– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
![Page 23: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/23.jpg)
Similar folds usually mean similar function
Homeodomain Transcriptionfactors
![Page 24: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/24.jpg)
Protein Folds
• A combination of secondary structural units– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
![Page 25: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/25.jpg)
The same fold can have multiple functions
Rossmann
TIM barrel
12 functions
31 functions
![Page 26: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/26.jpg)
Fold classification:
•Class:All alphaAll betaAlpha/betaAlpha+beta
•Fold•Family•Superfamily
SCOP Structure Classification Of Proteins
![Page 27: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/27.jpg)
Retinol Binding Protein
![Page 28: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/28.jpg)
Fold Recognition
• Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.
• There are many approaches, but the unifying theme is to try and find folds that are compatible with a particular sequence.
• Unlike sequence-based comparison, these methods take advantage of the extra information made available by 3D structure information.
• "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence
![Page 29: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/29.jpg)
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequenceQuery sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits best
There are different ways to evaluate sequence-structure fit
![Page 30: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/30.jpg)
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
There are different ways to evaluate sequence-structure fit
![Page 31: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/31.jpg)
Programs for fold recognition
• TOPITS (Rost 1995)
• GenTHREADER (Jones 1999)
• SAMT02 (UCSC HMM)
• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
![Page 32: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/32.jpg)
Ab Initio Modeling
• Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Apply minimum
Practically nearly impossible
WHY ?– Exceptionally complex calculations– Biophysics understanding incomplete
![Page 33: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/33.jpg)
Ab Initio Methods
• Rosetta (Bakers lab, Seattle)
• Undertaker (Karplus, UCSC)
![Page 34: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/34.jpg)
CASP - Critical Assessment of Structure Prediction
• Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.
• Current state - only fragments are “solved”:– ab-initio - the worst, but greatly improved in the last
years. – Modeling - performs very well when homologous
sequences with known structures exist.– Fold recognition - Performs well.
![Page 35: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/35.jpg)
What’s Next
Predicting function from structure
![Page 36: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/36.jpg)
Structural Genomics : a large scale structure determination project designed to cover all
representative protein structures
Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)
ATP binding domain of protein MJ0577
![Page 37: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/37.jpg)
~300unique folds
in PDBCurrently
~800 unique folds
![Page 38: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/38.jpg)
~1000- 3000unique folds
in “structure space”
Estimated
![Page 39: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/39.jpg)
Structure Genomics expectations
~ 5 proteins to characterize thesequence space
corresponding to 1 fold
~10000-15000new structures
expected
![Page 40: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/40.jpg)
As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved
Wanted !Automated methods to predict function from the protein structures resulting from the structural genomic project.
![Page 41: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/41.jpg)
Approaches for predicting function from structure
ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/
![Page 42: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/42.jpg)
Approaches for predicting function from structure
PHPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/
![Page 43: Protein Tertiary Structure Prediction](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134b4550346895d9bd18a/html5/thumbnails/43.jpg)
Approaches for predicting function from structure
SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2