protein-protein interactions networks

67
Protein-Protein Interactions Networks “A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae”P.Utez et al, Nature 2000 “Functional organisation of the yeast proteome by systematic analysis of protein complexes” G. Gavin et al, Nature 2002 “Global Mapping of the Yeast Genetic Interaction Network” Tong et al, Science 2004 “Global analysis of protein activities using proteome chips” Zhu, H. et al. Science 2001 “Conserved patterns of protein interaction in multiple species” R. Sharan et al, PNAS 2005

Upload: wardah

Post on 31-Jan-2016

75 views

Category:

Documents


0 download

DESCRIPTION

Protein-Protein Interactions Networks. “ A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae ” P.Utez et al, Nature 2000 “ Functional organisation of the yeast proteome by systematic analysis of protein complexes ” G. Gavin et al, Nature 2002 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Protein-Protein Interactions Networks

Protein-Protein Interactions Networks

“A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae”P.Utez et al, Nature 2000

“Functional organisation of the yeast proteome by systematic analysis of protein complexes” G. Gavin et al, Nature 2002 “Global Mapping of the Yeast Genetic Interaction Network” Tong et al, Science 2004

“Global analysis of protein activities using proteome chips” Zhu, H. et al. Science 2001

“Conserved patterns of protein interaction in multiple species” R. Sharan et al, PNAS 2005

Page 2: Protein-Protein Interactions Networks

Genomics

Genomics – “The

large scale study

of genomes and

their functions”

Why protein

network?

Page 3: Protein-Protein Interactions Networks

Why protein network?

Assemblies represent more than the sum of their parts.

`complexity' may partly rely on the contextual combination of the gene products.

24,000 genes24,000 genes 26,000 genes26,000 genes 50,000 genes50,000 genes

19,50019,500 genesgenes14,00014,000 genesgenes

Page 4: Protein-Protein Interactions Networks

Yeast as a model

Why yeast genomics?

A model eukaryote organism …

Saccharomyces cerevisiaeSaccharomyces cerevisiae

Page 5: Protein-Protein Interactions Networks
Page 6: Protein-Protein Interactions Networks

The best-studied organism

~5,500 genes.

16(!) chromosomes.

13 Mb of DNA (humans have ~3,000 Mb).

We know (?) the function of >1/2 of the

yeast genes.

All the essential functions are conserved

from yeast to humans.

Page 7: Protein-Protein Interactions Networks

Example: cell cycle

Lee Hartwell, Nobel Prize 2001

Page 8: Protein-Protein Interactions Networks

4 methodologies for high throughput research

Two hybrid systems

Analysis of protein complexes

Synthetic lethal

Protein Chips (?)

Page 9: Protein-Protein Interactions Networks

Two hybrid system

Aim: Identify pairs of Physical interactions.

Solution: Use the transcription mechanism of the

cell

Page 10: Protein-Protein Interactions Networks

3

TRANSCRIPTION

DNA

RNA

TRANSLATION

PROTEIN

The central dogma

Page 11: Protein-Protein Interactions Networks

Transcription factors

Movie – transcription (molecular model, real time) 7.2

Page 12: Protein-Protein Interactions Networks

Transcription – real time( viedo)

Page 13: Protein-Protein Interactions Networks

Eukaryotic mRNA

Reporter gene

Page 14: Protein-Protein Interactions Networks

Two hybrid system

Isolate double plasmids using reporter or selection methods.

Page 15: Protein-Protein Interactions Networks

All against All

Page 16: Protein-Protein Interactions Networks

Focus on the baits

Baits are analyzed separately. 192 baits vs. ~6000 pray yeast

strains.

A component of RNA polymerase I, III, identification of three new interacting proteins

Page 17: Protein-Protein Interactions Networks

Two hybrid system

Page 18: Protein-Protein Interactions Networks

Two hybrid system“A comprehensive two-hybrid analysis to explore the yeast

protein interactome“ Ito T. et al, PNAS 2001.

Page 19: Protein-Protein Interactions Networks

Analysis of protein complexes

Aim: Identification of complexes and their sub units.

Solution: a two step method Isolation of only relevant complexes Identification of complex units.

Page 20: Protein-Protein Interactions Networks

Double Isolation

Page 21: Protein-Protein Interactions Networks

Identification of the members

Divide and conquer-

• Digest with protease

• Mass spectrometry

• Denaturate assembly

Page 22: Protein-Protein Interactions Networks

How does it work?

The deflection route of ionized molecules is used to determine the molecule’s mass.

The output:

Page 23: Protein-Protein Interactions Networks

Analysis of protein complexes Cross results of peptide mass with protein database.

Mass spectrometry can be implied again if the data is not sufficient, this time for the peptides.

Page 24: Protein-Protein Interactions Networks

Analysis of protein complexes• Systematic(1): 1739 bait proteins.

• 232 complexes with 589 baits.• Systematic(2): 725 bait proteins.

• 3,617 interactions with 493 baits.

Page 25: Protein-Protein Interactions Networks
Page 26: Protein-Protein Interactions Networks

Analysis of protein complexes

About 25% false positive rate. Covers 56/60%, 10/35% in Y2H, of known

complexes. Only 7% of the interactions were seen by Y2H

assays.

But, Can evaluate protein-

Concentration. Localization. Post-translational modifications.

Page 27: Protein-Protein Interactions Networks

Synthetic lethality

First, few words on essentiality. Create new strains, each strain with one

gene deleted (96% coverage) Tag each strains with a unique sequence. Grow all the strains. Measure the amount of each seq. Some 18.7% (1,105) are essential.

Page 28: Protein-Protein Interactions Networks

Synthetic lethality

High genetic redundancy hardens the discovery of many gene functions (30%).

Only the double mutation is lethal, either of the single mutations is viable.

Why? Single biochemical pathway. Two distinct pathways for one process. …

Page 29: Protein-Protein Interactions Networks

The naïve approach

But how do you genomics it? …

Page 30: Protein-Protein Interactions Networks

All vs. All

~5100 non essential mutants.

• Main tricks: 1. Haploid strains

2. Resistant markers.

3. Extra marker for the library haploid.

Page 31: Protein-Protein Interactions Networks

Synthetic lethality …Making it genomics

Mass analysis: Crossing the query haploid with a library (synthetic genetic array)

Tetrad analysis: Validation and finding synthetic sick

Page 32: Protein-Protein Interactions Networks

The genetic interaction map 8 genes against all produced a network of synthetic

lethal pairs.

Page 33: Protein-Protein Interactions Networks

Synthetic lethality …Making it genomics

132 query genes vs. 4700 False negatives – 17-42%. At least 4 times more dense than the PPI network. Predicting ~100,000 interactions (?)

Page 34: Protein-Protein Interactions Networks

PPI Summery (2003)

Page 35: Protein-Protein Interactions Networks

PPI Summery

S. Cerevisiae (Yeast)• 4389 proteins• 14319 interactions

C. Elegans (Worm)• 2718 proteins• 3926 interactions

D. Melanogaster (Fly)• 7038 proteins• 20720 interactions

Sharan et al. PNAS 2005

Page 36: Protein-Protein Interactions Networks

We like Networks

Exploit graph theory methods.

Provide a general solution for data integration.

Page 37: Protein-Protein Interactions Networks

Network Structure and Function

Identify highly nonrandom network structural patterns that reflect function: Ideker et al: Finding co-regulated sub-graphs.

Lee at el: The repeated instances of each motif are the result of evolutionary convergence.

Barabasi at el: Network motifs are associated with specific cellular tasks.

Page 38: Protein-Protein Interactions Networks

Baker’s yeast(Saccharomyes

cerevisiae)~15000 interactions~5000 interacting

genes

Bacterial pathogen

(Helicobacter pylori)~1500

interactions~700 interacting

genes

Conserved patterns of PPI in multiple species

Kelley et al. PNAS 2003

Page 39: Protein-Protein Interactions Networks

Goals

Separating true PPI from false positives. Assign functional roles to interactions. Predict interactions. Organizing the data into models of cellular

signaling and regulatory machinery.

How? Use approach based on evolutionary cross-species

comparisons.

Page 40: Protein-Protein Interactions Networks

Interaction graph (per species)

Vertices are the organism’s interacting proteins. Edges are pair-wise interactions between proteins. Edges are weighted using a logistic regression model:

A: Number of times an interaction was observed. For Fly and worm observation In one experiment.

B: Correlation coefficient of the gene expression. Shown to be correlated to interaction.

C: Proteins’ small world clustering coefficient. Sum of the neighbors logHG probs.

Page 41: Protein-Protein Interactions Networks

How do we find Sub-network conservation?

Interactions within each species should approximate the desired structure: Pathway. Signal transduction. Cluster. Protein complex.

Many-to-many correspondence between the sets of proteins.

Page 42: Protein-Protein Interactions Networks

Network alignment graph Each node corresponds to k sequence-similar proteins.

BLAST E value < -7; considering the 10 best matches only. Cannot be split into two parts with no sequence similarity between

them.

Edge represents a conserved interaction. Match -> One pair of proteins directly interacts and all other

include proteins with distance <2 in the interaction maps. Gap –> All protein pairs are of distance 2 in the interaction maps. Match-Gap-> At least max{2, k −1} protein pairs directly interact.

A subgraph corresponds to a conserved sub-network.

Page 43: Protein-Protein Interactions Networks

q(e) – interaction similarity

A probabilistic model

PS Pe randomq

eqlog

Page 44: Protein-Protein Interactions Networks

Searching for conserved sub-networks

Identifying high-scoring subgraphs of the network alignment graph. …This problem is computationally hard.

Exhaustively we find seeds - paths with 4 nodes. Expand high scoring seeds. Greedily add/remove

nodes. Filter subgraphs with a high degree of overlap

(>80%).

Page 45: Protein-Protein Interactions Networks

Statistical evaluation of sub-networks

Randomized data is produced: Random shuffling of each of the interaction graphs. Randomizing the sequence-similarity relationships.

Find the highest-scoring sub-networks of a given size.

P-value is computed by the distribution of the top scores.

Page 46: Protein-Protein Interactions Networks

← P

rote

in s

eq

uen

ce

sim

ilari

ty →

← Bacteria →

← Yeast →

The final product

Page 47: Protein-Protein Interactions Networks

3-way Comparison

S. cerevisiae• 4389 proteins• 14319 interactions

C. elegans• 2718 proteins• 3926 interactions

D. melanogaster• 7038 proteins• 20720 interactions

Sharan et al. PNAS 2005

Page 48: Protein-Protein Interactions Networks

Multiple Network Alignment

Preprocessing

Interaction scores: logistic regression on #observations, expression correlation, clustering coeff.

Network alignmentSubnetwork search

Filtering & Visualizing

p-value<0.01, 80% overlap

Conserved paths

Conserved clustersProtein groups

Conserved interactions

Page 49: Protein-Protein Interactions Networks
Page 50: Protein-Protein Interactions Networks

Reduced false positives

Compared these conserved clusters to known complexes in yeast - Pure cluster - contain >2 annotated proteins and

>1/2 of these shared the same annotation. 94%(>83% in mono specie) pure clusters.

Did ‘‘sticky’’ proteins biased the clusters? Of 39 proteins (> 50 neighbors), only 10 were

included in conserved clusters. And they were annotated so.

Page 51: Protein-Protein Interactions Networks

Cross Validation: Function

Species #Correct #Predictions Success rate (%)

Yeast 114 198 58

Worm 57 95 60

Fly 115 184 63

Outperforms sequence-based approach at 37-53%.

Guilty by association. Enrichment of GO annotation (p<0.01). More then half of the annotated proteins had the annotation.

Page 52: Protein-Protein Interactions Networks

Cross Validation: Interaction

Species Sensitivity (%)

Specificity (%)

P-value Strategy

Yeast 50 77 1e-25 [1]

Worm 43 82 1e-13 [1]

Fly 23 84 5e-5 [1]

Yeast 9 99 1e-6 [1]+[2]

Worm 10 100 6e-4 [1]+[2]

Fly 0.4 100 0.5 [1]+[2]

[1] Evidence that proteins with similar sequences interact within other species.

[2] Co-occurrence of these proteins in the same conserved cluster.

Page 53: Protein-Protein Interactions Networks

Wet Validation: Interaction

The tests were performed by using two-hybrid assays. Of the 65 yeast predicted interactions:

5 were self inducing. 31 tested positive.

Page 54: Protein-Protein Interactions Networks

Conclusions

Associate proteins that are not necessarily each other’s best sequence match. 177/679 conserved clusters. 31/129 conserved paths.

Inter module interaction is reinforced by inter-species observations.

40-52% >> 0.042% as a random PPI prediction.

Many PPI circuits are conserved over evolution.

Page 55: Protein-Protein Interactions Networks

Thanks!!!

Recoverin, a calcium-activated myristoyl switch.

Page 56: Protein-Protein Interactions Networks

GO – Gene Ontology

all : all ( 171472 ) GO:0008150 : biological_process ( 109503 )

GO:0007582 : physiological process ( 70981 ) GO:0008152 : metabolism ( 41395 )

GO:0009058 : biosynthesis ( 10256 ) GO:0009059 : macromolecule biosynthesis (

6876 ) GO:0006412 : protein biosynthesis

( 4611 ) GO:0043170 : macromolecule metabolism ( 17198 )

GO:0009059 : macromolecule biosynthesis ( 6876 )

GO:0006412 : protein biosynthesis ( 4611 )

GO:0019538 : protein metabolism ( 12856 ) GO:0006412 : protein biosynthesis

( 4611 ) GO:0005575 : cellular_component ( 98453 ) GO:0003674 : molecular_function ( 108120 )

back

Page 57: Protein-Protein Interactions Networks

Interaction distribution

Page 58: Protein-Protein Interactions Networks

Expression data

Yeast - 794 conditions. Fly - over 90 CC time points+170 profiles. Worm - over 553 conditions.

back

Page 59: Protein-Protein Interactions Networks

Edge weight

where 0, . . . , 3 are the parameters of the distribution.

Maximize the likelihood: Positive: MIPS interactions. Negative: random or false positives in the cross

validation test. Yeast - 1006 positive and negative examples. Fly - 96 positive and negative examples. Worm – 24 positive and 50 negative examples.

back

Page 60: Protein-Protein Interactions Networks

71 conserved regions: 183 significant clusters and 240 significant paths.back

Page 61: Protein-Protein Interactions Networks

A probabilistic model Ms - the sub-network model. Mn - the null model. Ouv - the set of available observations on u-v. Puv- fraction of (u,v) in order preserving graphs family. T/Fuv – True/False edge (u,v).

back

Page 62: Protein-Protein Interactions Networks

A probabilistic model

Each species’ interaction map was randomly constructed.

Randomizing assumptions: Each interaction should be present independently

with high probability. The probability depends on their total number of

connections in the network.

Page 63: Protein-Protein Interactions Networks

Why Yeast?

“Comparative Genomics of the Eukaryotes” Rubin GM. et al. Science 2000 back

Page 64: Protein-Protein Interactions Networks

Analysis of protein complexes1. Isolation:

A straight forward method, using Affinity chromatography. A target protein is attached to polymer beads that are packed into a column. Cell proteins are washed through the column.Proteins the interact with the target protein adhere to the affinity matrix and are eluted later.

Page 65: Protein-Protein Interactions Networks

Analysis of protein complexes2. Isolation:

Co-immunoprecipitation. An antibody that recognizes the target protein is used to isolate the protein. Usually the there isn’t a highly specific antibody for the target protein. A chimera protein is formed, using a the target protein and an epitope tag.The common tag is a enzyme glutathione S-transferase (GST).

Page 66: Protein-Protein Interactions Networks

Analysis of protein complexes2. Isolation:

Isolation of complex using the Chimera

Glutathione coated beads

Cell extract

Glutathione solution

Page 67: Protein-Protein Interactions Networks

MIPS

Munich Information Center for Protein Sequences (MIPS).

Hierarchy Structure. Only manually annotated complexes from DIP. Left with 486 proteins spanning 57 categories at level

3.

back