protein structures in the pdb. domains proteins can be modular single chain may be divisible into...

24
Protein structures in the PDB

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Protein structures in the PDB

Page 2: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Domains• proteins can be modular

• single chain may be divisible into smaller independent units of tertiary structure called domains

• domains are the basic unit of structure classification

• different domains in a protein are also often associated with different functions carried out by the protein.

Page 3: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Definition of domain• “A polypeptide or part of a polypeptide chain that can

independently fold into a stable tertiary structure...”

from Introduction to Protein Structure, by Branden & Tooze

• “Compact units within the folding pattern of a single chain that look as if they should have independent stability.”

from Introduction to Protein Architecture, by Lesk

MBP Figure to go here

Page 4: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Motif (Supersecondary Structure)• there are certain favored arrangements of multiple

secondary structure elements that recur again and again in proteins--these are known as motifs or supersecondary structures

• a motif is usually smaller than a domain but can encompass an entire domain

greek key beta-alpha-beta

Page 5: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Protein Taxonomy-The CATH Hierarchy

1. Divide PDB structure entries into domains (using domain recognition algorithms--the domain is the fundamental unit of structure classification

2. Classify each domain according to a five level hierarchy:

ClassArchitectureTopologyHomologous SuperfamilySequence Family

the top 3 levels of the hierarchyare purely phenetic--basedon characteristics of the structure,not on evolutionary relationships

the bottom two levels includesome phyletic classification as well--groupings according to putativecommon ancestry based on structural similarity, functionalsimilarity, and sequence similarity

protein evolution is not well understood--there is to date no purelyphyletic classification system

Page 6: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Class

• In the CATH hierarchy, Class simply describes what type of secondary structure is present.

• There are only four classes:

mainly mainly

few secondary structures

• 90% of structures are trivial to assign at this level.

Page 7: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Architecture• Architecture is hard to define precisely

• In CATH it is defined broadly as describing “general features of protein shape” such as arrangements of secondary structure in 3D space

• It does not define connectivities between secondary structural elements--that’s what the topology level does. It does not even explicitly define directionality of secondary structure, e.g. parallel or antiparallel beta-sheets.

• in CATH, architectures are presently assigned manually, by visual inspection.

• let’s look at some architectures!

Page 8: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Some mostly beta architectures

Page 9: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Some mixed alpha-beta architectures

Page 10: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Topology (Fold)

• if two proteins have the same topology, it means they have the same number and arrangement of secondary structures, and the connectivities between these elements are the same.

• this is also sometimes called the fold of a protein.

• in CATH, automated structure alignment is used to group proteins according to topology. We will discuss this later.

• we will now look at some examples which illustrate differences in topology.

Page 11: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Topology: differences in connectivity

“greek key”“up-and-down”

• example: a four-stranded antiparallel beta-sheet can have many different topologies based on the order in whichthe four beta-strands are connected.

Page 12: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Topology: differences in handedness

• example: in a beta-alpha-beta motif, if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands.

Page 13: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Visualizing protein topology--TOPS cartoons

•up triangles=up-facing beta strands•down triangles=down-facing beta strands•horizontal rows of triangles=beta sheets (beta barrel would be a ring of triangles)•circles=helices

•lines=loops •if loops enter from top, line drawn to ctr.•if loops enter from bottom, line drawn to boundary

fold above is clearly an antiparallel beta-sandwich

Page 14: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Visual summary of top three levels of CATH hierarchy

CLASS

ARCHITECTURE

TOPOLOGY

Page 15: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Discovery of New Folds• structural taxonomy reveals that although structures are being

solved more rapidly than ever, fewer and fewer of them have new folds! Will we get them all soon?

Page 16: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Homologous superfamily/Sequence family

• The lowest two levels in the CATH hierarchy relate to common ancestry

• some, but not all proteins with the same fold show evidence of common ancestry

• the surest way of identifying common ancestry is that two proteins have sequences roughly >30% identical (sequence family level)

• if protein sequences are not that similar, common ancestry may still be inferred on the basis of a combination of structural and functional similarity, and possibly weak sequence similarity (homologous superfamily level)

Page 17: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Multifunctional “Superfolds”

some foldshave manyhomologoussuperfamilies,which meansthey are usedfor a varietyof functions.these are called“superfolds”

some architectureshave many folds--“superarchitecture”

Page 18: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

“Common core”

• structures need not share exactly the same number, type and connectivity of secondary structural elements to be grouped into a single fold type.

• in fact, evolutionarily related proteins often share a common core of structurally related elements but may differ in presence or absence of a secondary structure element or two.

Page 19: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Problems in Fold Classification

• “Structure space” has a continuous aspect, especially in certain types of folds, which makes clustering structures into fold families difficult. This is an inherent problem for any classification method based on hierarchical clustering.

• It seems reasonable to group as having the same fold proteins which share some common core but differ in addition/subtraction of a few secondary structure elements.

• But this can lead to unnaturally large and diverse fold families via the Russian doll effect and motif overlap.

Page 20: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Russian Doll Effect

• A continuous range of slight size differences will lead to clustering proteins of very different size. small--> medium-->large.

Page 21: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Motif Overlap

Motif overlap effects: Sometimes two proteins will share a common core but one of them will share a slightly different (but not necessarily larger) common core with a third protein. A continuous range of overlapping common cores

AB-->BC-->CD will lead to grouping proteins that have no common core.

Page 22: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Comparison of SCOP and CATH Hierarchies

SCOP CATH

class class

architecture

fold topology

homologous superfamily

superfamily

family sequence family

domain domain

CATH more directed toward structural classification,SCOP pays more attention to evolutionary relationships

Page 23: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

Another SCOP/CATH difference

• in CATH, there is one class to represent mixed alpha-beta

• in SCOP there are two:

: beta structure is largely parallel, made of motifs

: alpha and beta structure segregated to different parts of structure

Page 24: Protein structures in the PDB. Domains proteins can be modular single chain may be divisible into smaller independent units of tertiary structure called

SCOP and CATH

• they have in common that they are hierarchical and based on abstractions

• they both include some manual aspects and are curated by experts in the field of protein structure

• are there automated methods for structure classification/comparison?