cs177 lecture 7 computational aspects of protein structure ii

43
CS177 Lecture 7 Computational Aspects of Protein Structure II Tom Madej 10.25.04

Upload: kura

Post on 13-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

CS177 Lecture 7 Computational Aspects of Protein Structure II. Tom Madej 10.25.04. Research news ( Nature 10.21.04). Another milestone for the Human Genome Project. Fills in approx. 99% of the “gene rich” portion of the genome (10% more than the 2001 drafts). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS177 Lecture 7 Computational Aspects of Protein Structure II

CS177 Lecture 7

Computational Aspects of Protein Structure II

Tom Madej 10.25.04

Page 2: CS177 Lecture 7 Computational Aspects of Protein Structure II

Research news (Nature 10.21.04)

• Another milestone for the Human Genome Project.– Fills in approx. 99% of the “gene rich” portion of the genome

(10% more than the 2001 drafts).– Only 341 remaining gaps, formerly hundreds of thousands.– New estimate of the number of genes: 20,000-25,000.

• Megabase deletions result in viable mice!– Researchers deleted 1.5 Mb and 0.8 Mb portions of the mouse

genome, non-coding regions, and the mice seem to be fine!

Page 3: CS177 Lecture 7 Computational Aspects of Protein Structure II

Nature Oct. 21, 2004, 931-945

Page 4: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 5: CS177 Lecture 7 Computational Aspects of Protein Structure II

Example for last homework

• I searched “Structure” with the term “Leukemia”.• The first structure was 1uc6A. I noticed a couple of Vast

neighbors with low percent sequence identity but very similar folds, 1uemA (17.4%), 1uenA (13.7%).

• I ran PSI-BLAST with query sequence 1uc6A. The CD Search got a hit to “Fibronectin type 3”. 1uemA and 1uenA are also assigned to FN3, but for some reason 1uc6 is not (???).

• I got lucky, 1uemA and 1uenA were found by PSI-BLAST but did not cross the significance threshold prior to convergence!

Page 6: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 7: CS177 Lecture 7 Computational Aspects of Protein Structure II

Overview of lecture• Protein structure

– General principles– Structure hierarchy– Supersecondary structures– Superfolds and examples: TIM barrels, OB fold

• Protein structure comparison algorithms– VAST (Vector Alignment Search Tool)– CE (Combinatorial Extension)

• Protein fold classification databases– SCOP (Structural Classification of Proteins)– CATH (Class, Architecture, Topology, Homologous superfamily)

Page 8: CS177 Lecture 7 Computational Aspects of Protein Structure II

General principles

• Most protein structures are composed of two types of regular structural elements interconnected by less well-structured regions.

• Regular secondary structure elements (SSEs): α-helices and β-strands.

• Irregular regions: loops or coil.

• A pair of SSEs positioned next to each other in space may be parallel or anti-parallel.

Page 9: CS177 Lecture 7 Computational Aspects of Protein Structure II

General principles (cont.)

• Helices are stabilized by “internal” hydrogen bonds.

• Hydrogen bonds will form between an adjacent pair of strands.

• Strands will form larger structures such as β-sheets or β-barrels.

• Due to the residue side chains, there are favored packing angles between helices/helices, helices/sheets, and sheets/sheets.

Page 10: CS177 Lecture 7 Computational Aspects of Protein Structure II

Examples of protein architecture

β-sheet with all pairsof strands parallel

β-sheet with all pairsof strands anti-parallel

Architecture refersto the arrangementand orientation ofSSEs, but not to theconnectivity.

Page 11: CS177 Lecture 7 Computational Aspects of Protein Structure II

Examples of protein topology

Topology refers tothe manner in whichthe SSEs areconnected.

Two β-sheets (allparallel) with differenttopologies.

Page 12: CS177 Lecture 7 Computational Aspects of Protein Structure II

Exercise

• Take a look at 1r7sA in Cn3D.

• Draw a topology diagram showing the way the strands are connected.

Page 13: CS177 Lecture 7 Computational Aspects of Protein Structure II

Angles between SSEs in contact

• The data on the next 3 slides gives the cosine of angles between a pair of SSE vectors.

• The SSE’s were required to be “in contact”, i.e. within 10 Å of each other.

• Note: The SSEs are not necessarily consecutive in the sequence!

Page 14: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 15: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 16: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 17: CS177 Lecture 7 Computational Aspects of Protein Structure II

Examples of structures formed by β-strands

• Triosphosphate isomerase 7timA

• Retinol binding protein 1rbp

• Porin 1oh2P

Page 18: CS177 Lecture 7 Computational Aspects of Protein Structure II

Higher level organization

• A single protein may consist of multiple domains. Examples: 1liy A, 1bgc A. The domains may or may not perform different functions.

• Proteins may form higher-level assemblies. Useful for complicated biochemical processes that require several steps, e.g. processing/synthesis of a molecule. Example: 1l1o chains A, B, C.

Page 19: CS177 Lecture 7 Computational Aspects of Protein Structure II

Example: Replication Protein A

E. Bochkareva et al. The EMBO Journal (2002) 21 1855-1863

RPA binds to ssDNA, is involved in recombination, replication, and repair.It is a heterotrimer, consisting of three subunit proteins that bind together.See structure 1l1o.

Page 20: CS177 Lecture 7 Computational Aspects of Protein Structure II

Supersecondary structures

• β-hairpin

• α-hairpin

• βαβ-unit

• β4 Greek key

• βα Greek key

Page 21: CS177 Lecture 7 Computational Aspects of Protein Structure II

Supersecondary structure: simple units

G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981

Page 22: CS177 Lecture 7 Computational Aspects of Protein Structure II

Supersecondary structure: Greek key motifs

G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981

Page 23: CS177 Lecture 7 Computational Aspects of Protein Structure II

Examples of β4 Greek key motif

• 1hk0 Human Gamma-D Crystallin; residues 32 thru 64 in domain 1.

• OB fold (we’ll see this fold later).

Page 24: CS177 Lecture 7 Computational Aspects of Protein Structure II

Examples of βα Greek key motif

• 1bgw Topoisomerase; residues 487 thru 540 in domain 5.

• 1ris Ribosomal protein S6.

Page 25: CS177 Lecture 7 Computational Aspects of Protein Structure II

Protein folds

• There is a continuum of similarity!

• Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing.

• Fold classification: To get an idea of the variety of different folds, one must adjust for sequence redundancy and also try to correctly assign homologs that have low sequence identity (e.g. below 25%).

Page 26: CS177 Lecture 7 Computational Aspects of Protein Structure II

Superfolds (Orengo, Jones, Thornton)

• Distribution of fold types is highly non-uniform.

• There are about 10 types of folds, the superfolds, to which about 30% of the other folds are similar. There are many examples of “isolated” fold types.

• Superfolds are characterized by a wide range of sequence diversity and spanning a range of non-similar functions.

• It is a research question as to the evolutionary relationships of the superfolds, i.e. do they arise by divergent or convergent evolution?

Page 27: CS177 Lecture 7 Computational Aspects of Protein Structure II

Superfolds and examples

• Globin 1hlm sea cucumber hemoglobin; 1cpcA phycocyanin; 1colA colicin

• α-up-down 2hmqA hemerythrin; 256bA cytochrome B562; 1lpe apolipoprotein E3

• Trefoil 1i1b interleukin-1β; 1aaiB ricin; 1tie erythrina trypsin inhibitor

• TIM barrel 1timA triosephosphate isomerase; 1ald aldolase; 5rubA rubisco

• OB fold 1quqA replication protein A 32kDa subunit; 1mjc major cold-shock protein; 1bcpD pertussis toxin S5 subunit

• α/β doubly-wound 5p21 Ras p21; 4fxn flavodoxin; 3chy CheY

• Immunoglobulin 2rhe Bence-Jones protein; 2cd4 CD4; 1ten tenascin

• UB αβ roll 1ubq ubiquitin; 1fxiA ferredoxin; 1pgx protein G

• Jelly roll 2stv tobacco necrosis virus; 1tnfA tumor necrosis factor; 2ltnA pea lectin

• Plaitfold (Split αβ sandwich) 1aps acylphosphatase; 1fxd ferredoxin; 2hpr histidine-containing phosphocarrier

Page 28: CS177 Lecture 7 Computational Aspects of Protein Structure II

TIM barrels

• Classified into 21 families in the CATH database.

• Mostly enzymes, but participate in a diverse collection of different biochemical reactions.

• There are intriguing common features across the families, e.g. the active site is always located at the C-terminal end of the barrel.

Page 29: CS177 Lecture 7 Computational Aspects of Protein Structure II

N. Nagano et al. J. Mol. Biol. (2002) 321 741-785

Page 30: CS177 Lecture 7 Computational Aspects of Protein Structure II

TIM barrel evolutionary relationships(Nagano, Orengo, Thornton)

• Sequence analysis with advanced programs such as PSI-BLAST and IMPALA have identified further relationships among the families.

• Further interesting similarities observed from careful comparison of structures, e.g. a phosphate binding site commonly formed by loops 7, 8 and a small helix.

• In summary, there is evidence for evolutionary relationships between 17 of the 21 families.

Page 31: CS177 Lecture 7 Computational Aspects of Protein Structure II

OB (oligonucleotide/oligosaccharide-binding) fold

• 5-stranded β-barrel with Greek key topology.

• All OB folds have the same binding face that is involved in their biochemistry.

Page 32: CS177 Lecture 7 Computational Aspects of Protein Structure II

V. Arcus Curr. Opinion Struct. Biol. (2002) 12 794-801

Page 33: CS177 Lecture 7 Computational Aspects of Protein Structure II

OB evolutionary relationships

• SCOP lists 9 superfamilies.

• Bacterial enterotoxin superfamily consists of two families, almost certainly evolutionarily related.

• Nucleic acid-binding superfamily has 11 families, if evolutionarily related the ancestral protein would come from the LUCA (Last Universal Common Ancestor).

• Evidence for common ancestry of all OB folds is probably weaker than for TIM barrels.

Page 34: CS177 Lecture 7 Computational Aspects of Protein Structure II

Protein structure comparison

• How to compare 3D protein structures?

• Analogous computational considerations to sequence comparison, e.g. accuracy, efficiency for database searches, statistical significance of results, etc.

• Additional complication: working with atomic coordinates in 3D space!

Page 35: CS177 Lecture 7 Computational Aspects of Protein Structure II

Some protein structure comparison methods

• VAST (Vector Alignment Search Tool, NCBI)

• CE (Combinatorial Extension, RCSB/PDB)

• DALI (EBI)

Page 36: CS177 Lecture 7 Computational Aspects of Protein Structure II

VAST outline

1. Parse protein structures into SSEs (helices and strands).

2. Fit vectors to SSEs.3. To compare a pair of proteins attempt to superpose as

many vectors as possible, subject to constraints.4. Evaluate the vector alignment for statistical significance(

computer an E-value).5. If the vector alignment is significant then proceed to a

more detailed residue-to-residue alignment (“refined alignment”).

Page 37: CS177 Lecture 7 Computational Aspects of Protein Structure II

3chy 1ipf A

Two protein with vectors assigned to SSEs

Page 38: CS177 Lecture 7 Computational Aspects of Protein Structure II

Vector superposition Refined alignment

VAST comparison of 3chy and 1ipfA

Page 39: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 40: CS177 Lecture 7 Computational Aspects of Protein Structure II

SCOP (Structural Classification of Proteins)

• http://scop.mrc-lmb.cam.ac.uk/scop/

• Levels of the SCOP hierarchy:– Family: clear evolutionary relationship– Superfamily: probable common evolutionary origin– Fold: major structural similarity

Page 41: CS177 Lecture 7 Computational Aspects of Protein Structure II
Page 42: CS177 Lecture 7 Computational Aspects of Protein Structure II

CATH (Class, Architecture, Topology, Homologous superfamily)

• http://www.biochem.ucl.ac.uk/bsm/cath/

Page 43: CS177 Lecture 7 Computational Aspects of Protein Structure II