prediction of protein structure. aim structure prediction tries to build models of 3d structures of...
TRANSCRIPT
Prediction of protein structure
aim
Structure prediction tries to build models of 3D structures of proteins that could be useful for understanding structure-function relationships.
Genbank/EMBL 105.000.000
Uniprot 5.200.000
PDB 47.000
The protein folding problem
The information for 3D structures is coded in the protein sequence
Proteins fold in their native structure in seconds
Native structures are both thermodynamically stables and kinetically available
AVVTW...GTTWVRAVVTW...GTTWVR
ab-initio prediction
Prediction from sequence using first principles
Ab-initio prediction
“In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure”
Simulaciones de 1 s de “folding” de una proteína modelo (Duan-Kollman: Science, 277, 1793, 1998).
Simulaciones de folding reversible de péptidos (20-200 ns) (Daura et al., Angew. Chem., 38, 236, 1999).
Simulaciones distribuidas de folding de Villin (36-residues) (Zagrovic et al., JMB, 323, 927, 2002).
... the bad news ...
It is not possible to span simulations to the “seconds” range
Simulations are limited to small systems and fast folding/unfolding events in known structures steered dynamics biased molecular dynamics
Simplified systems
typical shortcuts
Reduce conformational space 1,2 atoms per residue fixed lattices
Statistic force-fields obtained from known structures Average distances between residues Interactions
Use building blocks: 3-9 residues from PDB structures
Some protein from ESome protein from E.coli.coli predicted at 7.6 Åpredicted at 7.6 Å
(CASP3, H.Scheraga)(CASP3, H.Scheraga)
Results from ab-initio
Average error 5 Å - Average error 5 Å - 10 Å10 Å
Function cannot be Function cannot be predictedpredicted
Long simulationsLong simulations
comparative modelling
The most efficient way to predict protein structure is to compare with known 3D structures
Protein folds
Basic concept
In a given protein 3D structure is a more conserved characteristic than sequence Some aminoacids are “equivalent” to each other Evolutionary pressure allows only aminoacids
substitutions that keep 3D structure largely unaltered
Two proteins of “similar” sequences must have the “same” 3D structure
Possible scenarios1. Homology can be recognized using sequence comparison tools or
protein family databases (blast, clustal, pfam,...).
Structural and functional predictions are feasible
2. Homology exist but cannot be recognized easily (psi-blast, threading)
Low resolution fold predictions are possible. No functional information.
3. No homology
1D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction
fold prediction
3D struc. prediction
1D prediction
Prediction is based on averaging aminoacid properties
AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS
Average over a window
1D prediction. Properties
Secondary structure propensitites Hydrophobicity (transmembrane) Accesibility ...
Aminoacido P() P() P(turn)Ala 1.29 0.9 0.78Cys 1.11 0.74 0.8Leu 1.3 1.02 0.59Met 1.47 0.97 0.39Glu 1.44 0.75 1Gln 1.27 0.8 0.97His 1.22 1.08 0.69Lys 1.23 0.77 0.96
Val 0.91 1.49 0.47Ile 0.97 1.45 0.51Phe 1.07 1.32 0.58Tyr 0.72 1.25 1.05Trp 0.99 1.14 0.75Thr 0.82 1.21 1.03
Gly 0.56 0.92 1.64Ser 0.82 0.95 1.33Asp 1.04 0.72 1.41Asn 0.9 0.76 1.23Pro 0.52 0.64 1.91
Arg 0.96 0.99 0.88
Propensities Chou-FasmanBiochemistry 17, 4277 1978
turn
Some programs (www.expasy.org)
BCM PSSP - Baylor College of Medicine Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction GOR I (Garnier et al, 1978) [At PBIL or at SBDS] GOR II (Gibrat et al, 1987) GOR IV (Garnier et al, 1996) HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at
University of Dundee nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader,
MaxHom, EvalSec from Columbia University PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPM (Geourjon and Deléage, 1994) SOPMA (Geourjon and Deléage, 1995) AGADIR - An algorithm to predict the helical content of peptides
1D Prediction
Original methods: 1 sequence and uniform parameters (25-30%)
Original improvements: Parameters specific from protein classes
Present methods use sequence profiles obtained from multiple alignments and neural networks to extract parameters (70-75%, 98% for transmembrane helix)
Methods for remote homology
Homology can be recognized using PSI-Blast
Fold prediction is possible using threading methods
Acurate 3D prediction is not possible: No structure-function relationship can be inferred from models
Threading
Unknown sequence is “folded” in a number of known structures
Scoring functions evaluate the fitting between sequence and structure according to statistical functions and sequence comparison
ATTWV....PRKSCTATTWV....PRKSCT
..........
10.510.5 5.2>> ..........
SELECTED HITSELECTED HIT
ATTWV....PRKSCTATTWV....PRKSCT SequenceSequenceHHHHH....CCBBBBHHHHH....CCBBBB Pred. Sec. Struc.Pred. Sec. Struc.eeebb....eeebebeeebb....eeebeb Pred. accesibilityPred. accesibility
..........
SequenceSequence GGTV....ATTW ........... ATTVL....FFRKGGTV....ATTW ........... ATTVL....FFRKObs SS Obs SS BBBB....CCHH ........... HHHB.....CBCB BBBB....CCHH ........... HHHB.....CBCB Obs Acc. Obs Acc. EEBE.....BBEB ........... BBEBB....EBBEEEBE.....BBEB ........... BBEBB....EBBE
Threading accurancyThreading accurancy
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
% ACIERTOS
5 10 15 20 25
% IDENTIDAD SECUENCIAS
Comparative modelling
Good for homology >30%
Accurancy is very high for homology > 60%
Reminder The model must be USEFUL Only the “interesting” regions of the protein need
to be modelled
Expected accurancy
Strongly dependent on the quality of the sequence alignment
Strongly dependent on the identity with “template” structures. Very good structures if identity > 60-70%.
Quality of the model is better in the backbone than side chains
Quality of the model is better in conserved regions
Quality test
No energy differences between a correct or wrong model
The structure must by “chemically correct” to use it in quantitative predictions
Analysis software
PROCHECK WHATCHECK Suite Biotech PROSA
Prediction software
SwissModel (automatic) http://www.expasy.org/swissmod/
SwissModel Repository http://swissmodel.expasy.org/repository/
3D-JIGSAW (M.Stenberg) http://www.bmm.icnet.uk/servers/3djigsaw/
Modeller (A.Sali) http://salilab.org/modeller/modeller.html
MODBASE (A. Sali) http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi
Final test
The model must justify experimental data (i.e. differences between unknown sequence and templates) and be useful to understand function.