genome-wide functional linkage maps methods for inferring functional linkages: complexes, pathways...
TRANSCRIPT
Genome-wide Functional Linkage MapsMethods for inferring functional
linkages: Complexes, Pathways
Rosetta stone Phylogenetic profiles Gene neighbors Operon method (Microarray method)
The Genome-wide functional linkage Map in M. tb
Assessing accuracy of functional linkages
Functional linkages in structural genomics
Analyzing parallel pathways
The DIP and ProLinks databases
TB Gene B0 1000 2000 3000 4000
TB G
ene
A0
1000
2000
3000
4000
Diphtheria Toxin Dimer vs. Monomer
Bennett et al., PNAS, Vol. 91, 3127-3131 (1994)
Rosetta Stone Assumption: Fusion of functionally-linked domains
In organism 1:
A
In organism 2:
Implies proteins A and B may be functionally linked
A
A'
B
B'
Marcotte et al. (1999) Science, 285, 751
PHYLOGENETIC PROFILE METHOD
Pellegrini et al (1999) PNAS 96, 4285
The Gene Neighbor Method for Inferring Functional Linkages
genome 1
. . .genome 2 genome 3 genome 4
A
AA
A
B
B
BB
C
C
CC
A
B
C
A statistically significant correlation is observed between the positions of proteins A and B across multiple genomes. A functional relationship is inferred between proteins A and B, but not between the other pairs of proteins:
gene A bbbb gene B gene C
OPERON or GENE CLUSTER method of inferring functional linkages in the genome of Mycobacterium tuberculosis
Distance thresholdNumber of predicted operon groups # of genes with links # of functional linkages
0 bp 542 1279 203425 bp 792 2071 444250 bp 879 2420 589075 bp 919 2665 7026100 bp 933 2870 8468
The 100 bp threshold is chosen because it gives thebroadest coverage consistent with high accuracy
Research of Michael Strong
vs
Network Interaction Map vs. Genome-Wide Functional Linkage MapWhole Genome Functional Linkage Map
(RS, PP, GN, OP overlap)
TB Gene B0 1000 2000 3000 4000
TB
Gen
e A
0
1000
2000
3000
4000
Functional linkage between Gene A and Gene B
Strong, Graeber et al. (2003) Nucleic Acid Research, 31, 7099
Figure 7. M. Strong, T. Graeber et al.
Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB)
TB Gene B0 1000 2000 3000 4000
TB G
ene
A
0
1000
2000
3000
4000
Requiring 2 or more functional linkages:1,865 genes make 9,766 linkages
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Cluster A: 6 genes; 5 annotated 4 linkages 5 genes coding for DNA replication or repair The 6th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Cluster A: 6 genes; 5 annotated 5 linkages 5 genes coding for DNA replication or repair The 6th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon None of the genes is a homolog
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophatase activities 2 genes: cell wall biosynth. 1 gene: unannotated
Gene 14, pknB (a Ser/Thr kinase) contains PASTA domains (penicillin-binding serine/threonine kinase associated)
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophotase activities 2 genes: cell wall biosynth. 1 gene: unannotated
Gene 19 is unannotated. It containsA FHA (Forkhead associated) domain,which binds phosphothreonine containing proteins.
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
D
Cluster D: Links gene 50 (a penicillin binding protein involved in cell wall synthesis) to gene 51 (an integral membrane protein).
Whole Genome Functional Linkage MapZoom (Genes Rv0001-Rv0051)
TB Gene B
0 10 20 30 40 50
TB G
ene
A
0
10
20
30
40
50
A
E
F
C
B
DE is a functional link between gene 16 (pbkA incell wall biosynthesis) and gene 50 (the penicillinbinding protein involved in cell wall biosynthesis)
Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB)
TB Gene B0 1000 2000 3000 4000
TB G
ene
A
0
1000
2000
3000
4000
Some columns showsimilar linkages, socluster like columns,using Eisen et al.(1998)procedure, CLUSTER
Hierarchical clustering of the TBWhole Genome Functional Linkage Map
Research of MichaelStrong and Tom Graeber
Functional modules range in sizeFrom 2 to > 100 linkages
Dozens of off diagonal functional linkages
Detoxification
Polyketide and non-ribosomal Peptide synthesis
Energy Metabolism,oxidoreductases
Polyketide and non-ribosomal,Degradation of Fatty acids, and Energy Metabolism
Degradation of Fatty acids
Research of Michael Strong and Tom Graeber
DetoxificationPolyketide and non-ribosomal peptide synthesis
Energy Metabolism, oxidoreductase
Deg. of Fatty AcidsVirulenceEnergy Metabolism, oxidoreductase Amino acid Biosynthesis
Emergy Metab. Respiration AerobicLipid Biosynthesis
Degradation of Fatty Acids
Amino Acid Biosynthesis (Branched)
Synthesis and Modif. Of Macromolecules, rpl,rpm, rpsBiosynthesis of Cofactors, Prosthetic groups
Purine, Pyrimidine nucleotide biosynthesisNovel Group Sugar MetabolismAromatic Amino Acid BiosynthesisEnergy Metabolism, Anaerobic Respiration
Two component systemsCell EnvelopeCytochrome P450Chaperones
Biosynthesis of cofactors
Cell Envelope, Cell Division
Transport/Binding Proteins
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins Cations
Energy Metabolism, ATP Proton Motive force
Fig 4.M. Strong, T. Graeber et al.
DetoxificationPolyketide and non-ribosomal peptide synthesis
Energy Metabolism, oxidoreductase
Deg. of Fatty AcidsVirulenceEnergy Metabolism, oxidoreductase Amino acid Biosynthesis
Emergy Metab. Respiration AerobicLipid Biosynthesis
Degradation of Fatty Acids
Amino Acid Biosynthesis (Branched)
Biosynthesis of Cofactors, Prosthetic groups
Purine, Pyrimidine nucleotide biosynthesisNovel Group Sugar MetabolismAromatic Amino Acid BiosynthesisEnergy Metabolism, Anaerobic Respiration
Two component systemsCell EnvelopeCytochrome P450Chaperones
Biosynthesis of cofactors
Cell Envelope, Cell Division
Transport/Binding Proteins
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins Cations
Energy Metabolism, ATP Proton Motive force
One of 7 modules of unannotated linkages,perhaps undiscovered pathways or complexes
HisG
HisF
HisI / HisI2
HisA
HisH
HisB
HisC / HisC2
HisB
HisD
Pathway Reconstruction fromFunctional Linkages
All 9 enzymes of the histidine biosynthesispathway are linked, and are clusteredseparately from other amino acid syntheticpathways
CtaD
CtaE CtaC
Functional Linkages Among Cytochrome Oxidase Genes
CtaBFunctional linkages relate all 3 componentsof cytochrome oxidase complexand also CtaB, the cytochrome oxidase assembly factor
These genes are at four different chromosomallocations
Membrane proteins linked to soluble proteins
Quantitative Assessment of Inferred Protein Complexes
Research of Edward Marcotte, Matteo Pellegrini, Michael Thompson and Todd Yeates
Calculating Probabilities of Co-evolution
m
Nkm
nN
k
n
NmnkP ),,|(
1
0 !
ln)(1)(
m
k
k
mm k
XXXPXP
nenP 1)(
Phylogenetic ProfileRosetta Stone
Gene Neighbor
Operon
N= number of fully sequenced genomesn= number of homologs of protein Am = number of homologs of protein Bk = number of genomes shared in common
X= fractional separation of genes
n = intergenic separation
Combining Inferences of Co-Evolution from 4 Methods
We use a Bayesian approach to combine the probabilities from the four methods to arrive at a single probability that two proteins co-evolve:
)(
)(
)|(
)|(4
1 negP
posP
negfP
posfPO
i i
ipost
where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation
ProLinks Database www.dip.doembi.ucla.edu/pronav
~ 10,000,000 Functional Linkages inferred from 83 fully sequenced genomes
Benchmarking this Approach Against Known Complexes
Ecocyc: Karp et al. NAR, 30, 56 (2002)
True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes.
ROC plot
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009
Fraction of False Positives
Fra
ctio
n o
f T
rue
Po
siti
ves
For high confidence links, we find 1/3 of true interactions with only one 1/1000 of the false positive ones
Random
Researchof MatteoPellegrini
Example Complex: NADH Dehydrogenase I
11 of 13 subunits detected
Example Complex: NADH Dehydrogenase I
11 of 13 subunits detected
3 false positives
From Inferred Protein Linkages to Structures of
Complexes
Research of Michael Strong, Shuishu Wang, Markus Kauffman
PE, PE-PGRS, and PPE Proteins in M. tuberculosis
38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins
Together compromise about 5 % of the genome
No function is known, but some appear to be membrane boundNo structure is known: always insoluble when expressed
Goal: use functional linkages to predict a complex betweena PE and a PPE protein: express complex, and determineits structure
Research of Shuishu Wang and Michael Strong
The Problem of PE and PPE Proteins in M. tb
Construction of a co-expression vector to test for protein-protein interactions (Mike Strong)
pET 29b(+)
T7 promoter lac oper. RBS
Nde1 HindIIIKpn1 NcoI
RBS gene A gene B
Thrombinsite
His tag
polycistronic mRNA
transcription
translation
protein A protein B (with His tag)
protein A protein B (with His tag) protein A protein B (with His tag)
If proteins interact (protein-protein interaction)
If proteins do not interact
When co-expressed, the PE and PPE proteins, inferred to interact, do form a soluble complex,
Mr = 35,200Sedimentation equilibrium experiments:Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8Concentration OD280 0.7, 0.45, 0.15
Expected Mr:
Rv 2431c (PE) 10,687
(10563.12 from Mass Spec)
Rv2430c+His tag (PPE) 24,072
(23895.00 from Mass Spec)
Possibly suggests a 1:1 complex between these
two proteins
Crystallization trials of the Complex Between PE Protein Rv2430c and PPE Protein Rv2431c
Database of Interacting Proteins www.dip.doe-mbi.ucla.edu
Experimentally detected interactions from the scientific literature
Currently ~ 44,000 interactions
The DIP Database
DOE-MBI LSBMM, UCLA
* *
*
Live DIP Gives the States of ProteinsTransitions Documented
ProLinks Database and the Protein Navigator
• Contains some 10,000,000 inferred functional linkages from 83 genomes
• Available at www.doe-mbi.ucla.edu
• Soon to be expanded to 250 fully sequenced genomes
• Eventually to be reconciled with DIP
Summary
AXY
Z
B
V
CA protein’s function is defined by the cellular context of its linkages
Many functional linkages are revealed from genomic and microarray data (high coverage)
Validity of functional linkages can be assessed by compar- ison to known complexes, and to expression data, and by keyword recovery Clustered genome-wide functional maps can reveal and organize information on complexes and pathways
Functional linkages can reveal protein complexes suitable for structural studies
Protein Interactions Analysis of M.tb. Genome
Michael Strong
Whole Genome Interaction MapsMichael Strong & Tom Graeber
Methods of Inferring InteractionsEdward Marcotte, Matteo Pellegrini, Todd YeatesMichael Thompson, Richard Llwellyn
Database of Interacting ProteinsLukasz Salwinski, Joyce Duan, Ioannis Xenarios,Robert Riley, Christopher Miller
Parallel pathwaysHuiying Li