genome-wide functional linkage maps methods for inferring functional linkages: complexes, pathways...

43
Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon method (Microarray method) The Genome-wide functional linkage Map in M. tb Assessing accuracy of functional linkages Functional linkages in structural genomics Analyzing parallel pathways The DIP and ProLinks databases TB G ene B 0 1000 2000 3000 4000 TB GeneA 0 1000 2000 3000 4000

Upload: cora-cooper

Post on 04-Jan-2016

254 views

Category:

Documents


10 download

TRANSCRIPT

Page 1: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Genome-wide Functional Linkage MapsMethods for inferring functional

linkages: Complexes, Pathways

Rosetta stone Phylogenetic profiles Gene neighbors Operon method (Microarray method)

The Genome-wide functional linkage Map in M. tb

Assessing accuracy of functional linkages

Functional linkages in structural genomics

Analyzing parallel pathways

The DIP and ProLinks databases

TB Gene B0 1000 2000 3000 4000

TB G

ene

A0

1000

2000

3000

4000

Page 2: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Diphtheria Toxin Dimer vs. Monomer

Bennett et al., PNAS, Vol. 91, 3127-3131 (1994)

Page 3: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon
Page 4: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Rosetta Stone Assumption: Fusion of functionally-linked domains

In organism 1:

A

In organism 2:

Implies proteins A and B may be functionally linked

A

A'

B

B'

Marcotte et al. (1999) Science, 285, 751

Page 5: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

PHYLOGENETIC PROFILE METHOD

Pellegrini et al (1999) PNAS 96, 4285

Page 6: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

The Gene Neighbor Method for Inferring Functional Linkages

genome 1

. . .genome 2 genome 3 genome 4

A

AA

A

B

B

BB

C

C

CC

A

B

C

A statistically significant correlation is observed between the positions of proteins A and B across multiple genomes. A functional relationship is inferred between proteins A and B, but not between the other pairs of proteins:

Page 7: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

gene A bbbb gene B gene C

OPERON or GENE CLUSTER method of inferring functional linkages in the genome of Mycobacterium tuberculosis

Distance thresholdNumber of predicted operon groups # of genes with links # of functional linkages

0 bp 542 1279 203425 bp 792 2071 444250 bp 879 2420 589075 bp 919 2665 7026100 bp 933 2870 8468

The 100 bp threshold is chosen because it gives thebroadest coverage consistent with high accuracy

Research of Michael Strong

Page 8: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

vs

Network Interaction Map vs. Genome-Wide Functional Linkage MapWhole Genome Functional Linkage Map

(RS, PP, GN, OP overlap)

TB Gene B0 1000 2000 3000 4000

TB

Gen

e A

0

1000

2000

3000

4000

Functional linkage between Gene A and Gene B

Strong, Graeber et al. (2003) Nucleic Acid Research, 31, 7099

Page 9: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Figure 7. M. Strong, T. Graeber et al.

Page 10: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB)

TB Gene B0 1000 2000 3000 4000

TB G

ene

A

0

1000

2000

3000

4000

Requiring 2 or more functional linkages:1,865 genes make 9,766 linkages

Page 11: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Page 12: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Cluster A: 6 genes; 5 annotated 4 linkages 5 genes coding for DNA replication or repair The 6th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon

Page 13: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Cluster A: 6 genes; 5 annotated 5 linkages 5 genes coding for DNA replication or repair The 6th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon None of the genes is a homolog

Page 14: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophatase activities 2 genes: cell wall biosynth. 1 gene: unannotated

Gene 14, pknB (a Ser/Thr kinase) contains PASTA domains (penicillin-binding serine/threonine kinase associated)

Page 15: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophotase activities 2 genes: cell wall biosynth. 1 gene: unannotated

Gene 19 is unannotated. It containsA FHA (Forkhead associated) domain,which binds phosphothreonine containing proteins.

Page 16: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

D

Cluster D: Links gene 50 (a penicillin binding protein involved in cell wall synthesis) to gene 51 (an integral membrane protein).

Page 17: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)

TB Gene B

0 10 20 30 40 50

TB G

ene

A

0

10

20

30

40

50

A

E

F

C

B

DE is a functional link between gene 16 (pbkA incell wall biosynthesis) and gene 50 (the penicillinbinding protein involved in cell wall biosynthesis)

Page 18: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB)

TB Gene B0 1000 2000 3000 4000

TB G

ene

A

0

1000

2000

3000

4000

Some columns showsimilar linkages, socluster like columns,using Eisen et al.(1998)procedure, CLUSTER

Page 19: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Hierarchical clustering of the TBWhole Genome Functional Linkage Map

Research of MichaelStrong and Tom Graeber

Functional modules range in sizeFrom 2 to > 100 linkages

Dozens of off diagonal functional linkages

Page 20: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Detoxification

Polyketide and non-ribosomal Peptide synthesis

Energy Metabolism,oxidoreductases

Polyketide and non-ribosomal,Degradation of Fatty acids, and Energy Metabolism

Degradation of Fatty acids

Research of Michael Strong and Tom Graeber

Page 21: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

DetoxificationPolyketide and non-ribosomal peptide synthesis

Energy Metabolism, oxidoreductase

Deg. of Fatty AcidsVirulenceEnergy Metabolism, oxidoreductase Amino acid Biosynthesis

Emergy Metab. Respiration AerobicLipid Biosynthesis

Degradation of Fatty Acids

Amino Acid Biosynthesis (Branched)

Synthesis and Modif. Of Macromolecules, rpl,rpm, rpsBiosynthesis of Cofactors, Prosthetic groups

Purine, Pyrimidine nucleotide biosynthesisNovel Group Sugar MetabolismAromatic Amino Acid BiosynthesisEnergy Metabolism, Anaerobic Respiration

Two component systemsCell EnvelopeCytochrome P450Chaperones

Biosynthesis of cofactors

Cell Envelope, Cell Division

Transport/Binding Proteins

Energy Metabolism TCA

Broad Regulatory, Serine Threonine Protein Kinase

Cell Envelope, Murein Sacculus and Peptidoglycan

Transport/Binding Proteins Cations

Energy Metabolism, ATP Proton Motive force

Fig 4.M. Strong, T. Graeber et al.

Page 22: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

DetoxificationPolyketide and non-ribosomal peptide synthesis

Energy Metabolism, oxidoreductase

Deg. of Fatty AcidsVirulenceEnergy Metabolism, oxidoreductase Amino acid Biosynthesis

Emergy Metab. Respiration AerobicLipid Biosynthesis

Degradation of Fatty Acids

Amino Acid Biosynthesis (Branched)

Biosynthesis of Cofactors, Prosthetic groups

Purine, Pyrimidine nucleotide biosynthesisNovel Group Sugar MetabolismAromatic Amino Acid BiosynthesisEnergy Metabolism, Anaerobic Respiration

Two component systemsCell EnvelopeCytochrome P450Chaperones

Biosynthesis of cofactors

Cell Envelope, Cell Division

Transport/Binding Proteins

Energy Metabolism TCA

Broad Regulatory, Serine Threonine Protein Kinase

Cell Envelope, Murein Sacculus and Peptidoglycan

Transport/Binding Proteins Cations

Energy Metabolism, ATP Proton Motive force

One of 7 modules of unannotated linkages,perhaps undiscovered pathways or complexes

Page 23: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

HisG

HisF

HisI / HisI2

HisA

HisH

HisB

HisC / HisC2

HisB

HisD

Pathway Reconstruction fromFunctional Linkages

All 9 enzymes of the histidine biosynthesispathway are linked, and are clusteredseparately from other amino acid syntheticpathways

Page 24: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

CtaD

CtaE CtaC

Functional Linkages Among Cytochrome Oxidase Genes

CtaBFunctional linkages relate all 3 componentsof cytochrome oxidase complexand also CtaB, the cytochrome oxidase assembly factor

These genes are at four different chromosomallocations

Membrane proteins linked to soluble proteins

Page 25: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Quantitative Assessment of Inferred Protein Complexes

Research of Edward Marcotte, Matteo Pellegrini, Michael Thompson and Todd Yeates

Page 26: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Calculating Probabilities of Co-evolution

m

Nkm

nN

k

n

NmnkP ),,|(

1

0 !

ln)(1)(

m

k

k

mm k

XXXPXP

nenP 1)(

Phylogenetic ProfileRosetta Stone

Gene Neighbor

Operon

N= number of fully sequenced genomesn= number of homologs of protein Am = number of homologs of protein Bk = number of genomes shared in common

X= fractional separation of genes

n = intergenic separation

Page 27: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Combining Inferences of Co-Evolution from 4 Methods

We use a Bayesian approach to combine the probabilities from the four methods to arrive at a single probability that two proteins co-evolve:

)(

)(

)|(

)|(4

1 negP

posP

negfP

posfPO

i i

ipost

where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation

Page 28: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

ProLinks Database www.dip.doembi.ucla.edu/pronav

~ 10,000,000 Functional Linkages inferred from 83 fully sequenced genomes

Page 29: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Benchmarking this Approach Against Known Complexes

Ecocyc: Karp et al. NAR, 30, 56 (2002)

True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes.

ROC plot

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009

Fraction of False Positives

Fra

ctio

n o

f T

rue

Po

siti

ves

For high confidence links, we find 1/3 of true interactions with only one 1/1000 of the false positive ones

Random

Researchof MatteoPellegrini

Page 30: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Example Complex: NADH Dehydrogenase I

11 of 13 subunits detected

Page 31: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Example Complex: NADH Dehydrogenase I

11 of 13 subunits detected

3 false positives

Page 32: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

From Inferred Protein Linkages to Structures of

Complexes

Research of Michael Strong, Shuishu Wang, Markus Kauffman

Page 33: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

PE, PE-PGRS, and PPE Proteins in M. tuberculosis

38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins

Together compromise about 5 % of the genome

No function is known, but some appear to be membrane boundNo structure is known: always insoluble when expressed

Goal: use functional linkages to predict a complex betweena PE and a PPE protein: express complex, and determineits structure

Research of Shuishu Wang and Michael Strong

The Problem of PE and PPE Proteins in M. tb

Page 34: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Construction of a co-expression vector to test for protein-protein interactions (Mike Strong)

pET 29b(+)

T7 promoter lac oper. RBS

Nde1 HindIIIKpn1 NcoI

RBS gene A gene B

Thrombinsite

His tag

polycistronic mRNA

transcription

translation

protein A protein B (with His tag)

protein A protein B (with His tag) protein A protein B (with His tag)

If proteins interact (protein-protein interaction)

If proteins do not interact

Page 35: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

When co-expressed, the PE and PPE proteins, inferred to interact, do form a soluble complex,

Mr = 35,200Sedimentation equilibrium experiments:Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8Concentration OD280 0.7, 0.45, 0.15

Expected Mr:

Rv 2431c (PE) 10,687

(10563.12 from Mass Spec)

Rv2430c+His tag (PPE) 24,072

(23895.00 from Mass Spec)

Possibly suggests a 1:1 complex between these

two proteins

Page 36: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Crystallization trials of the Complex Between PE Protein Rv2430c and PPE Protein Rv2431c

Page 37: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Database of Interacting Proteins www.dip.doe-mbi.ucla.edu

Experimentally detected interactions from the scientific literature

Currently ~ 44,000 interactions

Page 38: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

The DIP Database

DOE-MBI LSBMM, UCLA

Page 39: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon
Page 40: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

* *

*

Live DIP Gives the States of ProteinsTransitions Documented

Page 41: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

ProLinks Database and the Protein Navigator

• Contains some 10,000,000 inferred functional linkages from 83 genomes

• Available at www.doe-mbi.ucla.edu

• Soon to be expanded to 250 fully sequenced genomes

• Eventually to be reconciled with DIP

Page 42: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Summary

AXY

Z

B

V

CA protein’s function is defined by the cellular context of its linkages

Many functional linkages are revealed from genomic and microarray data (high coverage)

Validity of functional linkages can be assessed by compar- ison to known complexes, and to expression data, and by keyword recovery Clustered genome-wide functional maps can reveal and organize information on complexes and pathways

Functional linkages can reveal protein complexes suitable for structural studies

Page 43: Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon

Protein Interactions Analysis of M.tb. Genome

Michael Strong

Whole Genome Interaction MapsMichael Strong & Tom Graeber

Methods of Inferring InteractionsEdward Marcotte, Matteo Pellegrini, Todd YeatesMichael Thompson, Richard Llwellyn

Database of Interacting ProteinsLukasz Salwinski, Joyce Duan, Ioannis Xenarios,Robert Riley, Christopher Miller

Parallel pathwaysHuiying Li