proteomics 3d structure prediction. contents protein 3d structure. –basics –pdb –prediction...

41
PROTEOMICS 3D Structure Prediction

Upload: trevin-stearns

Post on 15-Jan-2016

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

PROTEOMICS3D Structure Prediction

Page 2: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Contents• Protein 3D structure.

– Basics– PDB– Prediction approaches

• Protein classification.

Page 3: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Primary Secondary Tertiary Quaternary

Amino acid sequence

Alpha helices & Beta sheets, loops.

Packing of secondaryelements.

Packing of several polypeptide chains

Protein Structures:

Page 4: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

How Does a Protein Fold• The classical nucleation-propagation model:

– the first event (fast) is hydrophobic collapse accompanied by the formation of secondary structures.In this step domains are formed.

– the second step (slow) is the precise ordering of the secondary elements: packing of hydrophobic core, domain arrangement, etc.

• The 3D structure is assumed to be the most stable structure - minimal free energy. – Local minimum or global minimum?

Page 5: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Prions• Proteins found in mammals.• Responsible for the mad cow disease.• There is no difference in the sequence of a

normal prion and an abnormal prion.• The difference lies in the 3D structure.• Disease is assumed to be propagated by the

insertion of an abnormal prion, that is capable of changing the configuration of a normal prion to an abnormal prion.

• Conclusion: there are several stable configurations for a single protein.

Page 6: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

PDB - Protein Data Base• http://www.rcsb.org/pdb/index.html• Contains proteins whose structure has

been solved.• Number of solved proteins: 19,225.• Ratio of solved structures / proteins: 1/7

(SwissProt) - 1/40 (TrEMBL)• The entry for each protein consists of the

x,y,z coordinates of every atom.• Tutorial

http://www.rcsb.org/pdb/query_tut.html

Page 7: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 8: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 9: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Prion Protein Domain from Mouse – Entry 1AG2:

Ribbons Vs. Cylinders

Page 10: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Broad View of the protein world I• Estimation: ~1000-20,000 protein

families composed of members that share detectable sequence similarity.– A new sequence is expected to be similar to

other sequences in the data base, and can be expected to share structural features with these proteins.

• Structure prediction:– >50% sequence identity imply similar

structure.– >30% sequence identity imply common

structural elements

Page 11: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Broad View of the protein world II• There is a limited number of different 3D

structures.– Comparing newly generated structures with

previously found structures, the new structure often fold into alpha & beta elements in the same order and in the same spatial configuration as already known structures.

• Often there is no sequence similarity.• Totally different sequences can fold into

similar structures.

Page 12: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Three Main Approaches for Structural Prediction:

• Ab-Initio.• Comparative Modeling.• Fold Recognition.

Example:A pathway for folding a 2-domain protein.

Page 13: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

http://www.pdg.cnb.uam.es/cursos/FVi2001DIA1/

Page 14: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

The Ab-Initio Method• The Structural Prediction Problem: “Given a

protein sequence, compute it’s structure”.

• Computation is based on energy calculation stemming from the position of each atom in space and its physical-chemical relations with other atoms.

• Theoretically possible.• Astronomical, highly under-constrained search

space.• Biophysics complex and incomplete.• Practically, next to impossible.

Page 15: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Comparative (Homology) Modeling • Evolutionary related proteins (homologous) usually have similar structure.

• The similarity of structures is very high in core regions (helices & sheets).However, loops may vary even in pairs of homologous structures with high degree of sequence similarity.

Thick backbone - known structure. Thin lines - modeled structure.Some side-chains are not positioned correctly, but some look good.

Page 16: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Structure similarity predicted from sequence similarity:• Sander & Schneider (1991) aligned all the sequences in PDB.

• Developed a formula for structure similarity based on sequence similarity.

• Structure similarity depends on the length of the protein.

Modeling Performance

Page 17: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Modeling Performance - Examples

•A protein of 10 amino acids requires 80% identity for a similar structure.

• A protein of length > 80 requires • ~30% identity for common sub-structures.• ~50% identity for a similar structure.• ~80% identity for a similar structure in a very good resolution.

Page 18: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Fold Recognition Approaches

Fold - a combination of secondary structural units in the same configuration. Protein structural classification uses fold as a basic level of classification.

Page 19: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Fold<->Family Relations• Estimation 1: There are 1500-20,000 protein

families, based on homology. Each family contains ~ one fold.

• Estimation 2: There are 700-1500 protein folds.

• Conclusions:1. Many protein families share the same fold.2. Different sequences are folded similarly.

• The common fold approach to structure prediction: Use the collection of determined structures to predict the structure of a protein.

Page 20: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 21: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

How Condensed is a Fold?

How many different sequences can result in the same fold for an average domain of 150 amino acids?– There are 20150 ~10200 different sequences– about 1038 are less than 20% identical.– Assume that only 1 in a million has a stable

fold - 1032.– Expected number of different folds is 1000.– About 1029 different sequences fold similarly.

Page 22: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Fold Recognition• A fold is shared by family members, both close

and distant (distance is related to sequence similarity)– the globin fold

• For a query protein - if its family members are identified, and their fold is known, we could assign it the same fold.

Method 1: Which alignment algorithm detects close and distant relatives?

PSI-BLAST

Page 23: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Fold Recognition - Threading• Threading allows for identification of structure

similarity without sequence similarity.

• The amino acid (aa) sequence of a query protein is examined for compatibility with the structural core of a known protein.

“Given a protein structure, what sequences fold into it ?”

Page 24: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Threading• The protein core is a very compact environment

composed of alpha and beta secondary structures.

• Very hydrophobic, no place for water molecules, other aa, or aa with chemically different side chains.

• Side chains have many contacts with neighboring aa for stability.

• Threading matches the aa of the query with aa of a known structure:– If threading gives a good score, then the core

of the query is assumed to fold similarly.

Page 25: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Threading• Two main methods:

– Contact potential method. – Structural profile (Environmental template).

• Contact potential method– the number of contact points and proximity

between aa is analyzed for every known structure.

– The query is checked against all the interactions in the core and their contribution to the stability of the structure.

– The fold that results in the most energetically stable structure is chosen.

Page 26: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Threading - Structural Profile• The environment of every aa in known

structures is determined, including – the secondary structure, the area of the side-chain

that is buried by closeness to other atoms, types of nearby chains, etc.

• Each position is classified into one of 18 types– 6 representing increasing levels of residue burial and

fraction of surface covered by polar atoms– combined with three classes of secondary structures.

• Each aa is assessed for its ability to fit into that type of site in the structure.– Buried group is matched well with hydrophobic aa.

Page 27: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 28: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Structural Profile• Profile rows are the residues in the

structure according to the 18 different types.

• Profile columns are the 20 aa + insertion + deletion. – If residue in inside loop - many substitutions

are allowed, as well as insertions and deletions.

• The score for a given aa in a residue estimates the fitness of the aa to the residue type.

• How shall we find the best fitting region?

Page 29: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Structural Profile• Dynamic programming algorithm finds

the best match of a query sequence to a specific fold. – Statistical significance can be computed by

doing the above for all sequences in the database.

• The same analysis will be repeated for each fold.

• The fold with the best statistically significant score is chosen.

Page 30: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 31: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Threading - Pros and Cons:

• Good results.

• Environmental properties may be more accurate then amino acid similarity matrices.

• Can lead to effective and fast implementations.

• Able to discover structural similarities impossible to detect by sequence searching methods.

• Requires the existence of already known proteins with similar structure.

Page 32: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

CASP - Critical Assessment of Structure Prediction

• Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.

• Current state - only fragments are “solved”:• ab-inito - the worst, but greatly improved in the last years. • Modeling - performs very well when homologous sequences with known structures exist.• Fold recognition - PSI-BLAST is used for training the threading procedures. Performs well.

Page 33: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

http://www.bmm.icnet.uk/people/rob/CCP11BBS/flowchart2.html

A Clickable Structure PredictionFlowchart:

Page 34: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Protein ClassificationProteins are classified to reflect both structural

andevolutionary relatedness. The principal levels are:

1. Family: Clear evolutionary relationship. In general, > 30% pairwise residue identity

between the proteins.

2. Superfamily: Probable common evolutionary origin.

Combines families whose member proteins have low sequence identities, but whose

structural and functional features suggest a common evolutionary origin.

Structurally, superfamily members share a common fold.

Page 35: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

SCOP - Structural Classification of Proteins• http://scop.mrc-lmb.cam.ac.uk/scop/• Hierarchical classification of all proteins

with known structures. • Classification:

• Class - all alpha, all beta, alpha & beta (a/b), alpha + beta (a + b).

• Superfamily.• Family.• Fold - the major structural similarity unit.• PDB entry for a protein.

Page 36: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 37: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

• http://www.biochem.ucl.ac.uk/bsm/cath_new/index.html

• Another protein structure classification database.

• Classification:• Class - all alpha, all beta, alpha & beta (a/b), alpha

+ beta (a + b). • Architecture - gross orientation of

secondary structures, independent of connectivity.

• Topology - clusters structures according to their topological connections and numbers of secondary structures.

• Homologous superfamilies - clusters proteins with highly similar structures and functions.

CATH- Class Architecture Topology Homologous Superfamily

Page 38: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 39: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

PFAM - Protein Families• http://www.sanger.ac.uk/Software/Pfam/• Database that contains large collection of

multiple sequence alignments and profile hidden Markov Models (profile HMMs).

• Profile HMM is a probabilistic model which describes a set of sequences.

• Widely used to describe related sequences.

• Defines domains - areas of homology that have a 3D structure independent of the rest of the protein.

Page 40: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification
Page 41: PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification

Classification of all the proteins in the SWISSPROT and TrEMBL databases, into groups of related proteins.

http://protomap.cornell.edu/