3. lecture ws 2008/09bioinformatics iii1 v3 from protein complexes to networks and back protein...

37
3. Lecture WS 2008/09 Bioinformatics III 1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated expression of genes/proteins (2) Proteins participating in the same metabolic pathways (3) Proteins sharing substrates (4) Proteins that are co-localized (5) Proteins that form permanent supracomplexes = „protein machines“ (6) Proteins that bind each other transiently (signal transduction, bioenergetics ... ) In V4 we will look at computational methods to predict protein-protein interactions. Today, we will look at permanent and transient protein

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 1

V3 From Protein Complexes to Networks and back

Protein networks could be defined in a number of ways

(1) Co-regulated expression of genes/proteins

(2) Proteins participating in the same metabolic pathways

(3) Proteins sharing substrates

(4) Proteins that are co-localized

(5) Proteins that form permanent supracomplexes = „protein machines“

(6) Proteins that bind each other transiently

(signal transduction, bioenergetics ... )

In V4 we will look at computational methods to predict protein-protein interactions.

Today, we will look at permanent and transient protein complexes.

Page 2: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 2

X-ray diffraction

X-rays are electromagnetic waves in the ultra short ('hard') regime with wave lengths on the order of 0.1 nm. When they hit a sample, the electromagnetic X-rays undergo weak interactions with the electron clouds around the atomic nuclei which leads to partial diffraction of the incoming beam into different angles. As the interaction is quite weak, a noticeable diffraction intensity can only be detected in orientations where the diffracted beams from many molecules sum up in a constructive way. Here, we need to appreciate that electromagnetic waves are sinusoidal waves that may be described by an amplitude and phase. Therefore, intensities are only detected in those orientations where the path difference of waves originating from different molecules equals integer multiples of their phases. This requires a very ordered orientation of all molecules like in a three-dimensional crystal. Still, in almost all orientations, the overlap of various waves will not be constructive. Images on the photographic plate (or CCD detector) are recorded for various rotational orientations of the crystal. Structure determination involves reconstruction of the molecular structure of the target molecule that will give rise to the observed reflections. The numerical methods mostly involve Fourier Transformation. A crystallographic structure determination ultimately reveals contours of the electron density. Atomic models are then refined using this electron density and information about typical bond lengths and bond angles of chemical bonds between atoms.

X-ray crystallography is the most

popular method for biomacro-

molecular structure determination.

When successful it yields very

accurate structures of molecular

complexes, from small molecules up

to very large complexes as the

ribosome or viral capsids

Page 3: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 3

FRET: Förster resonance energy transfer

(Top) The chromophore of a cyan-fluorescent protein (CFP) absorbs light at 436 nm and emits light at 480 nm. (Bottom) This scenario involves the same CFP and a second protein, for example a yellow fluorescent protein (YFP), that absorbs light around 480 nm and emits light at 535 nm. If these two proteins are closer than 5 nm, the light emitted from CFP is partly absorbed by YFP due to fluorescence resonance energy transfer (FRET). In the upper scenario, illumination at 436 nm only leads to emission at one wave length, in the bottom scenario one obtains two emission lines allowing one to conclude that the CFP and YFP molecules were closer than 5 nm. In the same way, additional proteins A and B may be fused to CFP and YFP to probe the interaction of A and B. The emission spectrum of the first dye must overlap with the absorption spectrum of the second dye.

Fluorescence resonance energy transfer describes an energy transfer mechanism between two fluorescent molecules. A fluorescent donor is excited at its specific fluorescence excitation wavelength. By a long-range dipole-dipole coupling mechanism, this excitation energy is then nonradiatively transferred to a second molecule, the acceptor, while the donor returns to the electronic ground state. As the efficiency of this energy transfer decreases quickly with the sixth power of the inverse distance, the distance between donor and acceptor molecules can be deduced from observing the fluorescence of the acceptor and comparing it to a reference intensity- The described energy transfer mechanism is termed 'Förster resonance energy transfer' (FRET), after the German scientist Theodor Förster. When both molecules are fluorescent, the term 'fluorescence resonance energy transfer' is often used, although the energy is actually not transferred by fluorescence.

Page 4: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 4

Y2H: yeast two-hybrid

An advantage of this method is that the interactions are probed in vivo. As yeast is cheap and robust, the method can be applied on a large scale. Disadvantages are that the interactions need to be probed in the nucleus and some proteins, such as membrane proteins, may not be translocated easily into the nucleus. Also, it is possible that two proteins interact in Y2H-experiments although they are not simultaneously expressed during the cell cycle or in the particular compartment. Besides, a reported interaction may also be mediated through a third (or even more) proteins that bind X and Y simultaneously. In this case, X and Y could be reported to interact directly (although they actually do not), and the mediating partners would remain undetected.

Yeast two-hybrid screening (Y2H) is a molecular biology technique used to discover protein-protein interactions by testing for physical interactions (such as binding) between two proteins. One protein is termed the bait and the other is a library protein (or prey).The idea behind the test is the activation of downstream reporter gene(s) by the binding of a transcription factor to an upstream activating sequence (UAS). For the purposes of two-hybrid screening, the transcription factor is split into two separate fragments, called Binding Domain (BD) and Activating Domain (AD). The BD is the domain responsible for binding to the UAS and the AD is responsible for activation of transcription. The key to the two-hybrid screen is that in most eukaryotic transcription factors, the activating and binding domains are modular and can function in close proximity to each other without direct binding. This means that even though the transcription factor is split into two fragments, it can still activate transcription even if the two fragments are only indirectly connected.In the Y2H screen, the BD fragment is fused onto the bait protein X and the AD fragment onto a library protein Y. If X and Y bind to each other, then the AD and BD of the transcription factor would be indirectly connected and transcription of the reporter gene(s) could occur. If the two proteins do not interact, there would be no transcription of the reporter gene. Thus, a huge "library" of proteins can be tested for interaction with the bait. A common transcription factor used for yeast two-hybrid screening is GAL4 that binds specifically to the UAS sequence and initiates activation of a downstream target gene. For example, this can be the gene coding for green fluorescent protein.

Page 5: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 5

Other methods for the structural characterization of macromolecular assemblies

Russell et al. Curr. Opin. Struct. Biol. 14, 313 (2004)

(a) Electron diffraction map and 3D X-ray protein structure. X-ray provides atomic-resolution structures. (b) 3D protein structure and plot showing chemical shifts determined by NMR. NMR spectroscopy extracts distances between atoms by measuring transitions between different nuclear spin states within a magnetic field. These distances are then used as restraints to build 3D structures. NMR spectroscopy also provides atomic-resolution structures, but is generally limited to proteins of about 300 residues. It plays an increasingly important role in studying interaction interfaces between structures determined independently. (c) EM micrograph and 3D reconstruction of a virus capsid. EM is based on the analysis of images of stained particles. Different views and conformations of the complexes are trapped and thus thousands of images have to be averaged to reconstruct the three-dimensional structure. Classical implementations were limited to a resolution of 20 Å. More recently, single-particle cryo techniques, whereby samples are fast frozen before study, have reached resolutions as high as approximately 6 Å. EM provides information about the overall shape and symmetry of macromolecules. (d) Slice images and rendered surface of a ribosome-decorated portion of endoplasmic reticulum. In electron tomography, the specimen studied is progressively tilted upon an axis perpendicular to the electron beam. A set of projection images is then recorded and used to build a 3D model. This technique can tackle large organelles or even complete cells without perturbing their physiological environment. It provides shape information at resolutions of approximately 30 Å. (e) Yeast two-hybrid array screen and small network of interacting proteins. Interaction discovery comprises many different methods whose objective is to determine spatial proximity between proteins. These include techniques such as the two-hybrid system, affinity purification, FRET, chemical cross-linking, footprinting and protein arrays. These methods provide very limited structural information and no molecular details. Their strength is that they often give a quasi-comprehensive list of protein interactions and the networks they form.

Page 6: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 6

Overview over exp. techniques to detect PP interactions

Key data that can be obtained by various experimental techniques relevant to

studying structural properties of protein complexes. Electron tomography will be

introduced at the end of this chapter. Y2H, TAP, and MS stand for yeast two-

hybrid, tandem affinity purification, and mass spectroscopy

  X-ray crystallography

NMR-spectroscopy

EM Tomo-graphy

Immuno-EM

FRET Y2H TAP MS

structure ≤ 3Å X X

structure > 3Å X X X X

contacts X X X X X X X x

proximity X X X X X X X

stoichiometry X X X X X

Complexsymmetry

X X X X X

Page 7: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 7

Hybrid models: docking X-ray structures into EM maps

Russell et al. Curr. Opin. Struct. Biol. 14, 313 (2004)

Hybrid assembly of the 80S ribosome from yeast.

(a) Superposition of a comparative protein structure model (red) of a domain

from ribosomal protein L2 from Bacillus stearothermophilus with the actual

structure (blue) (PDB code 1RL2).

(b) A partial molecular model of the whole yeast ribosome calculated by fitting

atomic rRNA (not shown) and comparative protein structure models (ribbon

representation) into the electron density of the 80S ribosomal particle.

Page 8: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 8

Putative structure through modeling and low-resolution EM

Russell et al. Curr. Opin. Struct. Biol. 14, 313 (2004)

(a) Exosome subunits. The top of the panel shows the domain organization of two subunits

present in the complex, but lacking any detectable similarity to known 3D structures. The

model for the nine other subunits (bottom) was constructed by predicting binary interactions

using InterPReTS and building models based on a homologous complex structure using

comparative modeling.

(b) EM density map (green mesh) with the best fit of the model shown as a gray surface

and the predicted locations of the subunits labeled. The question marks indicate those

subunits for which no structures could be modeled.

Page 9: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 9

Potential errors in biochemical interaction discovery

Russell et al. Curr. Opin. Struct. Biol. 14, 313 (2004)

(a) Indirect interactions between

cyclin-dependent kinase regulatory

subunit (CKS) and cyclin A detected

by the Y2H system.

Several interactions between CKS

domains and cyclins were reported in

genome-scale two-hybrid studies.

However, analysis of 3D structures

suggests that the endogenous cyclin-

dependent kinase 2 (CDK2) probably

mediates the interaction, as

combining the CDK2–CKS and

CDK2–cyclin A structures places the

CKS and cyclin domains 18 Å apart.

(b) An example of an interaction that is not detected by any screen, possibly because molecular labels (e.g. affinity purification tags, or two-hybrid DNA binding or activation domains) are interfering with the interaction. The X-ray structure of the actin–profilin complex reveals that the actin C terminus (C-t) lies at the interaction interface (the other N and C termini are also labeled).

Page 10: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 10

1 Protein-Protein Complexes

It has been realized for quite some time that cells don‘t work by random

diffusion of proteins,

but require a delicate structural organization into large protein complexes.

Which complexes do we know?

Page 11: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 11

RNA Polymerase II

RNA polymerase II is the

central enzyme of gene

expression and synthesizes all

messenger RNA in

eukaryotes.

Cramer et al., Science 288, 640 (2000)

Page 12: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 12

RNA processing: splicesome

Structure of a cellular editor that "cuts and pastes" the first draft of RNA straight

after it is formed from its DNA template.

It has two distinct, unequal halves surrounding a tunnel.

Larger part: appears to contain proteins and the short segments of RNA,

smaller half: is made up of proteins alone.

On one side, the tunnel opens up into a cavity, which is believed to function as a

holding space for the fragile RNA waiting to be processed in the tunnel.

Profs. Ruth and Joseph Sperlinghttp://www.weizmann.ac.il/

Page 13: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 13

Protein synthesis: ribosome

The ribosome is a complex

subcellular particle composed of

protein and RNA. It is the site of

protein synthesis,

http://www.millerandlevine.com/chapter/12/cryo-em.html

Model of a ribosome with a

newly manufactured protein

(multicolored beads) exiting

on the right.

Page 14: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 14

Signal recognition particle

40S small ribosomal subunit

(yellow) 60S large ribosomal

subunit (blue), P-site tRNA

(green), SRP (red).

Halic et al. Nature 427, 808 (2004)

Cotranslational translocation of proteins across or into membranes is a vital process in all kingdoms of life. It requires that the translating ribosome be targeted to the membrane by the signal recognition particle (SRP), an evolutionarily conserved ribonucleoprotein particle. SRP recognizes signal sequences of nascent protein chains emerging from the ribosome. Subsequent binding of SRP leads to a pause in peptide elongation and to the ribosome docking to the membrane-bound SRP receptor. SRP shows 3 main activities in the process of cotranslational targeting: first, it binds to signal sequences emerging from the translating ribosome; second, it pauses peptide elongation; and third, it promotes protein translocation by docking to the membrane-bound SRP receptor and transferring the ribosome nascent chain complex (RNC) to the protein-conducting channel.

Page 15: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 15

Nuclear Pore Complex

A three-dimensional image of the

nuclear pore complex (NPC),

revealed by electron microscopy.

A-B The NPC in yeast.

Figure A shows the NPC seen

from the cytoplasm while figure B

displays a side view.

C-D The NPC in vertebrate

(Xenopus).

http://www.nobel.se/medicine/educational/dna/a/transport/ncp_em1.htmlThree-Dimensional Architecture of the Isolated Yeast Nuclear Pore Complex: Functional and Evolutionary Implications, Qing Yang, Michael P. Rout and Christopher W. Akey. Molecular Cell, 1:223-234, 1998

NPC is a 50-100 MDa protein assembly that

regulates and controls trafficking of

macromolecules through the nuclear

envelope.

Page 16: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 16

GroEL: a chaperone to assist misfolded proteins

Schematic Diagram of GroEL Functional States(a) Nonnative polypeptide substrate (wavy black line) binds to an open GroEL ring. (b) ATP binding to GroEL alters its conformation, weakens the binding of substrate, and permits the binding of GroES to the ATP-bound ring. (c) The substrate is released from its binding sites and trapped inside the cavity formed by GroES binding. (d) Following encapsulation, the substrate folds in the cavity and ATP is hydrolysed. (e) After hydrolysis in the upper, GroES-bound ring, ATP and a second nonnative polypeptide bind to the lower ring, discharging ligands from the upper ring and initiating new GroES binding to the lower ring (f) to form a new folding active complex on the lower ring and complete the cycle.

http://people.cryst.bbk.ac.uk/~ubcg16z/chaperone.html

Ransom et al., Cell 107, 869 (2001)

Page 17: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 17

Arp2/3 complex

The seven-subunit Arp2/3 complex choreographs the formation of branched actin

networks at the leading edge of migrating cells.

(A) Model of actin filament branches mediated by Acanthamoeba Arp2/3 complex.

(D) Density representations of the models of actin-bound (green) and the free, WA-

activated (as shown in Fig. 1D, gray) Arp2/3 complex.

Volkmann et al., Science 293, 2456 (2001)

Page 18: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 18

proteasome

The proteasome is the central

enzyme of non-lysosomal protein

degradation. It is involved in the

degradation of misfolded proteins

as well as in the degradation and

processing of short lived regulatory

proteins.The 20S Proteasome

degrades completely unfoleded

proteins into peptides with a

narrow length distribution of 7 to

13 amino acids.

http://www.biochem.mpg.de/xray/projects/hubome/images/rpr.gifLöwe, J., Stock, D., Jap, B., Zwickl, P., Baumeister, W. and Huber, R. (1995). Crystal structure of the 20S proteasome from the archaeon T. acidophilum at 3.4 Å resolution. Science 268, 533-539.

Page 19: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 19

icosahedral pyruvate dehydrogenase complex: a multifunctional catalytic machine

Model for active-site coupling in the E1E2 complex. 3 E1 tetramers (purple) are shown located above the corresponding trimer of E2 catalytic domains in the icosahedral core. Three full-length E2 molecules are shown, colored red, green and yellow. The lipoyl domain of each E2 molecule shuttles between the active sites of E1 and those of E2. The lipoyl domain of the red E2 is shown attached to an E1 active site. The yellow and green lipoyl domains of the other E2 molecules are shown in intermediate positions in the annular region between the core and the outer E1 layer. Selected E1 and E2 active sites are shown as white ovals, although the lipoyl domain can reach additional sites in the complex.

Milne et al., EMBO J. 21, 5587 (2002)

Page 20: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 20

Apoptosome

Apoptosis is the dominant form of programmed cell death during embryonic development and normal tissue turnover. In addition, apoptosis is upregulated in diseases such as AIDS, and neurodegenerative disorders, while it is downregulated in certain cancers. In apoptosis, death signals are transduced by biochemical pathways to activate caspases, a group of proteases that utilize cysteine at their active sites to cleave specific proteins at aspartate residues. The proteolysis of these critical proteins then initiates cellular events that include chromatin degradation into nucleosomes and organelle destruction. These steps prepare apoptotic cells for phagocytosis and result in the efficient recycling of biochemical resources.In many cases, apoptotic signals are transmitted to mitochondria, which act as integrators of cell death because both effector and regulatory molecules converge at this organelle. Apoptosis mediated by mitochondria requires the release of cytochrome c into the cytosol through a process that may involve the formation of specific pores or rupture of the outer membrane. Cytochrome c binds to Apaf-1 and in the presence of dATP/ATP promotes assembly of the apoptosome. This large protein complex then binds and activates procaspase-9.

Page 21: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 21

Classification of complexes

How can one best categorize these complexes?

Some possible classifications would be categorizing them by their functions, or by

their size, or by their non-protein components (as protein complexes may also

involve various other components involving nucleic acids, carbohydrates, lipids).

One clearly functional classification scheme distinguishes transient complexes

(enzyme-inhibitor, signal transduction) from stable/permanent complexes with life

times long compared to those of typical biochemical processes. Obviously,

obtaining structural information on transient complexes is much harder than for

permanent complexes.

One may also distinguish obligate and non-obligate complexes where the

components of obligate complexes function only when in the bound state,

whereas those of non-obligate complexes can also exist as monomers. Examples

of the latter class are antibodies that exist in the free form in the cell until a

suitable antigene target appears.

Page 22: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 22

2 Information on protein-protein networks

Page 23: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 23

2. Information on protein-protein networks:Yeast 2-Hybrid Screen

Data on protein-protein

interactions from

Yeast 2-Hybrid Screen.

One role of bioinformatics is to

sort the data.

Page 24: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 24

Protein cluster in yeast

Schwikowski, Uetz, Fields, Nature Biotech. 18, 1257 (2001)

Cluster-algorithm

generates one large

cluster for proteins

interacting with each

other based on

binding data of

yeast proteins.

Page 25: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 25

Annotation of function

Schwikowski, Uetz, Fields, Nature Biotech. 18, 1257 (2001)

After functional annotation:

connect clusters of

interacting proteins.

Page 26: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 26

Annotation of localization

Schwikowski, Uetz, Fields, Nature Biotech. 18, 1257 (2001)

Page 27: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 27

3 Systematic identication of large protein complexesYeast 2-Hybrid-method can only identify binary complexes.

In affinity purification, a protein of interest (bait) is tagged with a molecular label (dark route in the middle of the figure) to allow easy purification. The tagged protein is then co-purified together with its interacting partners (W–Z). This strategy can also be applied on a genome scale.

Gavin et al. Nature 415, 141 (2002)

Identify proteins

by mass spectro-

metry (MALDI-

TOF).

Page 28: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 28

Analyis of protein complexes in yeast (S. cerevisae)

Gavin et al. Nature 415, 141 (2002)

Identify proteins by

scanning yeast protein

database for protein

composed of fragments

of suitable mass.

Here, the identified

proteins are listed

according to their

localization (a).

(b) lists the number of

proteins per complex.

Page 29: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 29

Validation of methodology

Gavin et al. Nature 415, 141 (2002)

Check of the method: can the same

complex be obtained for different

choice of attachment point

(tag protein attached to different

coponents of complex)? Yes (see gel).

Method allows to identify components

of complex, not the binding interfaces.

Better for identification of interfaces:

Yeast 2-hybrid screen (binary interactions).

3D models of complexes are important

to develop inhibitors.

- theoretical methods (docking) - electron tomography

Page 30: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 30

Experiment

Start from 232 purified complexes from TAP strategy.

Select 102 that gave samples most promising for EM from analysis of gels and

protein concentrations.

Take EM images.

Theory

Make list of components.

Assign known structures of individual proteins.

Assign templates of complexes-If complex structure available for this pair- if complex structure available for homologous protein- if complex structure available for structurally similar protein (SCOP)

4 Aim: generate structures of protein complexes

Bettina Böttcher (EM)Rob Russell (Bioinformatics)

Page 31: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 31

define interface similarity by iRMSD

Aloy et al. Science, 303, 2026 (2004)

The plot illustrates the structural similarity of

complexes A-B to A'-B' measured by their

interface RMSD (iRMSD) as a function of the

average sequence similarity of A with A' and of

B with B'. The two lines drawn are 80% and

90% percentile lines meaning that 80% or 90%

of the solutions are found below the lines.

The plot illustrates the computation of the interface RMSD (iRMSD) between the complex A-B and A'-B'. Around each center of mass of any of the four proteins, six points are added along the axes of the corresponding coordinate system. Then, either the seven points representing A' are optimally superimposed on those of A and the RMSD between the points of B' with those of B is measured, or vice versa for superimposing B' on B and measuring the RMSD of the A domains. The iRMSD is taken as the smaller one of both values.

Page 32: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 32

How transferable are interactions?analyze interaction similarity (iRMSD)

vs. % sequence identity for all the

available pairs of interacting domains

with known 3D structure.

Curve shows 80% percentile (i.e. 80%

of the data lies below the curve).

Points below the line (iRMSD = 10 Å)

are similar in interaction.Aloy et al. Science, 303, 2026 (2004)

Conclusion: mode of interaction is

conserved among protein-protein

complexes with > 30 – 40% sequence

identity.

Page 33: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 33

Bioinformatics Strategy

Illustration of the methods and concepts

used. How predictions are made within

complexes (circles) and between them

(cross-talk). Bottom right shows two

binary interactions combined into a three-

component model

Aloy et al. Science, 303, 2026 (2004)

Page 34: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 34

3SOM algorithm: vector-based circumference superimposition

A 2D variant of the 3D vector-based surface

superimposition that is central to the 3SOM

algorithm. For each tested voxel a on the

circumference of the target, a vector va is

calculated that approximates the normal vector

orthogonal to the tangent line in a and with origin

in a. Vector va is superimposed on each vector vb

that is associated with a voxel b on the

circumference of the template. The goodness-of-

fit of the transformation in question is assessed by

measuring the circumference overlap, the fraction

of target circumference voxels that is projected

onto (or near) the template circumference

(triangles). In 3D, a rotational degree of freedom

is left around the superimposed vectors, which is

sampled in rotational steps of 9°.

Ceulemans, Russell J. Mol. Biol., 338, 783 (2004)

Page 35: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 35

Successful models of yeast complexes

(A) Exosome model on PNPase fit into

EM map.

(B) RNA polymerase II with RPB4

(green)/RPB7 (red) built on

Methanococcus jannaschii equivalents,

and SPT5/pol II (cyan) built with IF5A.

(C and D) Views of CCT (gold) and

phosphoducin 2/VID27 (red) fit into EM

map.

(E) Micrograph of POP complex, with

particle types highlighted.

(F) Ski complex built by combination of

two complexes.

Aloy et al. Science, 303, 2026 (2004)

Page 36: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 36

Cross talk between complexes

(Top) Triangles: components

with at least one modelable

structure and interaction;

squares, structure only;

circles, others.

Lines show predicted

interactions.

Thick lines imply a conserved

interaction interface; red, those

supported by experiment.

(Bottom) Expanded view of

cross-talk between transcription

complexes built on by a

combination of two complexes.

Aloy et al. Science, 303, 2026 (2004)

Page 37: 3. Lecture WS 2008/09Bioinformatics III1 V3 From Protein Complexes to Networks and back Protein networks could be defined in a number of ways (1) Co-regulated

3. Lecture WS 2008/09

Bioinformatics III 37

SummaryA combination of 3D structure and protein-interaction data can already provide a partial view of complex cellular structures.

The structure-based network derived from cross-talk between complexes provides a more realistic picture than those derived blindly from interaction data, because it suggests molecular details for how they are mediated.

Of course, the picture is still far from complete and there are numerous new challenges.

The structure-based network derived here provides a useful initial framework for further studies. Its beauty is that the whole is greater than the sum of its parts: Each new structure can help to understand multiple interactions.

The complex predictions and the associated network will thus improve exponentially as the numbers of structures and interactions increase, providing an ever more complete molecular anatomy of the cell.

Aloy et al. Science, 303, 2026 (2004)