protein structure modeling (2). prediction

47
Protein Structure Modeling (2)

Post on 19-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Protein Structure Modeling (2). Prediction

Protein Structure Modeling (2)

Page 2: Protein Structure Modeling (2). Prediction

Prediction

http://www.bmm.icnet.uk/people/rob/CCP11BBS/

Page 3: Protein Structure Modeling (2). Prediction

Template-Based Prediction

Structure is better conserved than sequence

Structure can adopt a wide range of mutations.

Physical forces favorcertain structures.

Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

Page 4: Protein Structure Modeling (2). Prediction

Evolutionary Comparison

• Sequence-sequence comparison: homology modeling (similar sequence – similar structure)

• Sequence-structure comparison: threading / fold recognition (sequences fold into a limited number of folds)

Page 5: Protein Structure Modeling (2). Prediction

• ~90% of new globular proteins share similar folds with known structures, implying the general applicability of comparative modeling methods for structure prediction

• general applicability of template-based modeling methods for structure prediction (currently 60-70% of new proteins, and this number is growing as more structures being solved)

• NIH Structural Genomics Initiative plans to experimentally solve ~10,000 “unique” structures and predict the rest using computational methods

Scope of the Problem

Page 6: Protein Structure Modeling (2). Prediction

Why do we need structural models?

1. only 20% of all proteins have a homologue in PDB

2. for ~ 70% of the proteins a suitable structure from which to build a 3D model is available.

3. predict functions of proteins that share low degrees of sequence similarity

4. identify proteins that may have new folds

Page 7: Protein Structure Modeling (2). Prediction

How many structures are there ?

Proteins, Peptides,

and Viruses

Protein/Nucleic Acid Complexes

Nucleic Acids

Carbohydrates total

Exp.X-ray

Diffraction and other

13259 636 602 14 14511

NMR 2174 82 424 4 2684

Theor.Theoretical Modeling

321 24 28 0 373

15754 742 1054 18 17568

Molecule Type

total

Source: http://www.rcsb.org/pdb/holdings.html

Protein Data Bank (PDB) Status: March 12, 2002

Page 8: Protein Structure Modeling (2). Prediction

How many folds are there ?

ClassNumber of

foldsNumber of

superfamiliesNumber of

families

All alpha proteins 138 224 337All beta proteins 93 171 276Alpha and beta proteins (a/b) 97 167 374Alpha and beta proteins (a+b) 184 263 391Multi-domain proteins 28 28 35Membrane and cell surface proteins 11 17 28Small proteins 54 77 116

Total 605 947 1557

Source: http://scop.berkeley.edu/count.html

Structural Classification of Proteins (SCOP): Status (1 Mar 2002) based on 13220 PDB entries

Page 9: Protein Structure Modeling (2). Prediction

Identification of new folds

Source: http://www.rcsb.org/pdb/holdings.html

Page 10: Protein Structure Modeling (2). Prediction

Old fold vs. new fold

• A chain fold is considered old if it is similar to one of selected chains according to the following criteria:

• RMSD < 3.0Å

• number of aligned positions >= 70% of the length of this chain.

Page 11: Protein Structure Modeling (2). Prediction

How many more folds are there ?

Estimation:

• number of possible folds ~ 4,000• database of 930 folds covers 90% of protein

families

Source:Govindarajan S., Recabarren R., & Goldstein R.A. 1999

Proteins: Structure, Function, and Genetics 35:408-414

Page 12: Protein Structure Modeling (2). Prediction

Homology Modeling

• also called “comparative protein modeling”, “modeling by homology”, “knowledge-based modeling”

• the most successful tool for prediction of protein structure from sequence 

Page 13: Protein Structure Modeling (2). Prediction

Homology Modeling

• Sequence is aligned with sequence of known structure, usually share sequence identity of 30% or more.

• The sequence is then superimposed onto the template, replacing equivalent side chain atoms where necessary.

• Refinement of structure to make it closer to actual than the template.

Page 14: Protein Structure Modeling (2). Prediction

Homology Modeling

• Given a sequence what is the best way of mounting it onto a known structure

GHIKLSYTVNEQNLKPERFFYTSAVAIL

Page 15: Protein Structure Modeling (2). Prediction

What is the basis for homology modeling?

• The relative RMSD of the -carbon coordinates is ~ 1 Å, if the protein core share 50% identity.

• Protein sequences with > 70% similarity allow construction of models with < 3 Å RMSD

• Reduction to:

- Loop structure modeling (connections , , , )

- Side-chain modeling (energy refinement)

Page 16: Protein Structure Modeling (2). Prediction

Input requirements for Homology Modeling

1. TARGET SEQUENCE (primary protein sequence with unknown structure)

2. TEMPLATE (protein whose 3D structure has already been determined)

3. SEQUENCE ALIGNMENT (using Clustal W) between template and target sequence

Page 17: Protein Structure Modeling (2). Prediction

Find the appropriate template

Please enter your sequence in FASTA format.

   SWISS-MODEL Blast

Find the Appropriate Modelling Template(s)

MTKNVLMLHGLAQSGDYFASKTKGFRAEMEKLGYKLYYPTAPNEFPPADVPDFLGAPGDGENTGVLAWLENDPSTGGYFIPQTTIDYLHNYVLENGPFAGIVGFSQGAGVTDFNGLLGLTTEEQPPLEFFMAVSGFRFQPQQYQEQYDLHPISVPSLHVQGELDTKVQGLYNSCTEDSRTLLMHSGGHFVPNSRGFVRKVAQWLQQLT*

Submit Request Clear Sequence

Source: http://www.expasy.org/swissmod/SM_Blast.html

Page 18: Protein Structure Modeling (2). Prediction

Choose a template

Page 19: Protein Structure Modeling (2). Prediction

Template search results

4CD2A topLIGAND INDUCED CONFORMATIONAL CHANGES IN THE CRYSTAL STRUCTURES OF PNEUMOCYSTIS CARINII DIHYDROFOLATE REDUCTASCOMPLEXES WITH FOLATE AND NADP+ MOL_ID: 1; MOLECULE: DIHYDROFOLATE REDUCTASE; CHAIN: A; SYNONYM: PCDHFR; EC: 1.5.1.3; ENGINEERED: YES MOL_ID: 1; ORGANISM_SCIENTIFIC: PNEUMOCYSTIS CARINII; ORGANISM_COMMON: BACTERIA; EXPRESSION_SYSTEM: ESCHERICHIA COLI; EXPRESSION_SYSTEM_COMMON: BACTERIA; EXPRESSION_SYSTEM_PLASMID: PT7-7; EXPRESSION_SYSTEM_GENE: C-DNA P.CARINII DHFR V.CODY,N.GALITSKY,D.RAK,J.R.LUFT,W.PANGBORN,S.F.QUEENER

Length = 202 Score = 157 bits (393), Expect = 9e-39 Identities = 82/220 (37 Positives = 138/220 (62 Gaps = 22/220 (10 Query: 232 RDLTMIVAVSSPNLGIGKKNSMPWHIKQEMAYFANVTSSTESSGQLEEGKSKIMNVVIMG 291 LT IVA GIG NS PW K E YF VTS E MNVV MG

Sbjct: 1 KSLTLIVALTT-SYGIGRSNSLPWKLKKEISYFKRVTSFVPTFDSFES-----MNVVLMG 54

Page 20: Protein Structure Modeling (2). Prediction

Mounting the sequence onto the structure

template

Target

Page 21: Protein Structure Modeling (2). Prediction

Mounted sequence

Yellow = adrenergic receptor sequenceBlue = adrenergic receptor (PDB 1F88 )

Page 22: Protein Structure Modeling (2). Prediction

Modeled structure

Gaps

Page 23: Protein Structure Modeling (2). Prediction

Corrected Model

Page 24: Protein Structure Modeling (2). Prediction

Refinement

• Bond angle energy

• Dihedral angle energy

• van der Waals energy

• Electrostatic interactions

• Hydrogenbonds

• Geometrical constraints

• Packing density

Page 25: Protein Structure Modeling (2). Prediction

Evaluating your model

• inaccurate if atomic coordinates are not within 0.5 A RMSD of template control

Page 26: Protein Structure Modeling (2). Prediction
Page 27: Protein Structure Modeling (2). Prediction

Threading-Based Protein Structure Prediction

Page 28: Protein Structure Modeling (2). Prediction

Threading, Fold recognition, Protein fold assignments

Given:

• a database of protein structures / folds summarizing designs found in nature

• individual protein sequence

Goal:

• Find the structural backbone that best fits the protein sequence. Opposite of protein folding problem.

Page 29: Protein Structure Modeling (2). Prediction

Concept of Threading

structure prediction through recognizing native-like fold

o Thread (align or place) a query protein sequence onto a template structure in “optimal” way

o Good alignment gives approximate backbone structure

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

Template set

Prediction accuracy: fold recognition / alignment

Page 30: Protein Structure Modeling (2). Prediction

Why is it called threading ?

• threading a specific sequence through all known folds

• for each fold estimate the probability that the sequence can have that fold

Page 31: Protein Structure Modeling (2). Prediction

Fold Recognitionand Threading

• Limited number of folds: 800-1000

• Known number of folds ~ 700

• Sequence-fold agreement ?

Page 32: Protein Structure Modeling (2). Prediction

Application of Threading

• Predict structure

• Identify distant homologues of protein families

• Predict function of protein with low degree of sequence similarity to other proteins

Page 33: Protein Structure Modeling (2). Prediction

Structure Families

SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/

(domains, good annotation)

CATH: http://www.biochem.ucl.ac.uk/bsm/cath/

CE: http://cl.sdsc.edu/ce.html

Dali Domain Dictionary: http://columba.ebi.ac.uk:8765/holm/ddd2.cgi

FSSP: http://www2.ebi.ac.uk/dali/fssp/

(chains, updated weekly)

HOMSTRAD:

http://www-cryst.bioc.cam.ac.uk/~homstrad/

HSSP: http://swift.embl-heidelberg.de/hssp/

Page 34: Protein Structure Modeling (2). Prediction

Hierarchy of Templates

Homologous family: evolutionarly related with a significant sequence identity -- 1827 in SCOP

Superfamily: different families whose structural and functional features suggest common evolutionary origin --1073 in SCOP (good tradeoff for accuracy/computing)

Fold: different superfamilies having same major secondary structures in same arrangement and with same topological connections (energetics favoring certain packing arrangements); -- 686 out of 39,893 in SCOP

Class: secondary structure composition.

Page 35: Protein Structure Modeling (2). Prediction

Template and Fold

Secondary structures and their arrangement

Non-redundant representatives through structure-structure comparison

Page 36: Protein Structure Modeling (2). Prediction

Core of a Template

Core secondary structures: -helices and -strands

Page 37: Protein Structure Modeling (2). Prediction

Representation of folds: Definition of Template

• Residue type / profile• Secondary structure type• Solvent accessibility• Coordinates for C / C

(Pairwise preferences between two residues)

Page 38: Protein Structure Modeling (2). Prediction

Threading- is alignment squared.

• Environmental preferences of aa’s: 3DPSSM– As environment classes (-helix, -sheet), solvent

accessibility– Pair potentials: physical interactions– Substitution matrices

• Possible alignments to template is evaluated.

• Evaluation of each position is dependent on rest of alignment.

Page 39: Protein Structure Modeling (2). Prediction

Scoring Function

…YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW…

How well a residue fits a structural environment: E_s (singleton term)

How preferable to put two particular residues nearby: E_p (pairwise term)

Alignment gap penalty: E_g

Total energy: E_m + E_p + E_s + E_g

Describe how sequence fits template

How well a sequence residue aligns to a residue on structure: E_m (mutation term)

Page 40: Protein Structure Modeling (2). Prediction

What We Learned…

• Why threading?• Evolutionary foundation of threading• Template library and its generation• Concept of scoring function

Page 41: Protein Structure Modeling (2). Prediction

CASP (Critical Assesment of Structure Predictions)

• the annual competition in protein structure prediction.

http://predictioncenter.llnl.gov/casp5/Casp5.html

Page 42: Protein Structure Modeling (2). Prediction

CASP (Critical Assesment of Structure Predictions)

• Targets for comparative modelling (15)fold recognition (22)

ab initio modelling (15)

http://predictioncenter.llnl.gov/casp5/Casp5.html

Page 43: Protein Structure Modeling (2). Prediction

CASP Experiment

• Experimentalists are solicited to provide information about structures expected to be soon solved

• Predictors retrieve the sequence from prediction center (predictioncenter.llnl.gov)

• Deposit predictions throughout the season

• Meeting held to assess results

Page 44: Protein Structure Modeling (2). Prediction

Prediction Categories

• Comparative Modeling – modeling by homology

• Fold Recognition– Advanced Sequence Comparison Methods– Threading

• New Fold Methods/ “ab initio”• Categories are separated by distance from

any known structure

Page 45: Protein Structure Modeling (2). Prediction

Expected Performance

Predicted model

X-raystructuretarget

t0100

PROSPECT (threading) prediction in CASP4:12 out 19 folds recognized

Page 46: Protein Structure Modeling (2). Prediction

Conclusions

• When a suitable template structure exists in PDB, using homology modeling on target sequence is best for predicting the structure

• Fold Recognition servers can help find a template when conventional sequence analysis methods fail

• Combining elements from several sources may allow you to construct reasonably accurate models

Page 47: Protein Structure Modeling (2). Prediction

Prediction

http://www.bmm.icnet.uk/people/rob/CCP11BBS/