1
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 1 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
Permissions Statement
This Presentation is copyright Mark Gerstein, Yale University, 2006.
Feel free to use images in it with
PROPER acknowledgement
(via citation to relevant papers or link to gersteinlab.org).
2
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 2 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
Understanding Protein Function on a Genome-scale using Networks
Mark B GersteinYale (Comp. Bio. & Bioinformatics)
Orfeome 2006
2006.11.16, 16:30-17:00
3
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 3 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
The problem: Grappling with Function on a Genome Scale?
• 250 of ~530 originally characterized on chr. 22 [Dunham et al.]
• >25K Proteins in Entire Human Genome(with alt. splicing)
.…… ~530
4
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 4 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
Traditional single molecule way to integrate
evidence & describe function
Descriptive Name: Elongation Factor 2
Summary sentence describing function:
This protein promotes the GTP-dependent
translocation of the nascent protein chain from the A-site to the P-site of
the ribosome.
EF2_YEAST
Lots of references to papers
5
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 5 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
Toward Systematic Ontologies for Function,
using Networks
All of SCOP entries
1Oxido-
reductases
3Hydrolases
1.1Acting on CH-OH
1.1.1.1 Alcohol dehydrogenase
ENZYME
1.1.1NAD and
NADP acceptor
NON-ENZYME
3.1Acting on
ester bonds
1 Meta-bolism
1.1 Carb.
metab.
3.8 Extracel.
matrix
3.8.2 Extracel.
matrixglyco-protein
1.1.1 Polysach.
metab.
3.8.2.1 Fibro-nectin
General similarity Functional class similarityPrecise functional similarity
3 Cell
structure
1.5Acting on
CH-NH
3.4Acting on
peptide bonds
1.1.1.3Homoserine
dehydrogenase
1.2Nucleotide
metab.
3.1 Nucleus
3.8.2.2Tenascin
1.1.1.1 Glycogenmetab.
1.1.1.2 Starchmetab.
3.1.1.1 Carboxylesterase
3.1.1Carboxylic
ester hydro-lases
3.1.1.8 Cholineesterase
General Networks[Eisenberg et al.]
Hierarchies & DAGs[Enzyme, Bairoch; GO, Ashburner;
MIPS, Mewes, Frishman]
Interaction Vectors [Lan et al, IEEE 90:1848]
8
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 8 G
ers
tein
.in
fo/t
alk
s
(c)
20
05
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
9
PROTEIN INTERACTION NETWORKS IN YEAST
Source: Gavin et al. Nature (2002), Uetz et al. Nature (2000), Cytoscape and DIP
• Determined by:
– Large-scale Yeast-two-hydrid
– TAP-Tagging
– Literature curation
• Currently over 20,000 unique interactions available in yeast
• Spawned a field of computational “graph theory” analyses that view proteins as “nodes” and interactions as “edges”
A snapshot of the current interactome Description and methodologies
ILLUSTRATIVE
DIP (Database of interacting Proteins)
11
INTERESTING PROPERTIES OF INTERACTION NETWORKS
Source: Various, see following slides
Network topology
Network Evolution
Relationship of topology and genomic features
Examples of studies
• What distribution does the degree (number of interaction partners) follow?
• What is the relationship between the degree and a proteins essentiality?
• Is there a relationship between a proteins connectivity and expression profile?
• What is the relationship between a proteins evolutionary rate and its degree?
• How did the observed network topology evolve?
OVERVIEW
13
HUBS TEND TO BE IMPORTANT PROTEINS, THEY ARE MORE LIKELY TO BE ESSENTIAL PROTEINS AND TEND TO BE MORE CONSERVED
Source: Jeong et al. Nature (2001), Yu et al. TiG (2004) and Fraser et al. Science (2002)
• By now it is well documented that proteins with a large degree tend to be essential proteins in yeast.
(“Hubs are essential”)
• Likewise, it has been found that hubs tend to evolve more slowly than other proteins
(“Hubs are slower evolving”)
Some Debate on this
15
THERE IS A RELATIONSHIP BETWEEN NETWORK TOPOLOGY AND GENE EXPRESSION DYNAMICS
Source: Han et al. Nature (2004) and Yu*, Kim* et al. (Submitted)
Frequency
Co-expression correlation
17
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 17
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
18
MOTIVATION
≠
AB1-4
Cdk/cyclin complex Part of the RNA-pol complex
ILLUSTRATIVE
A
B1
B2
B3
B4
Network perspective:
Structural biology perspective:
=
There remains a rich sourceof knowledge unmined by network
theorists!
19
THERE IS A PROBLEM WITH SCALE-FREENESS AND REALLY BIG HUBS IN INTERACTION NETWORKS
Source: DIP, Institut fuer Festkoerperchemie (Univ. Tuebingen)
A really big hub (>200 Interactions)
Gedankenexperiment
How many maximum neighbors can a protein have?
• Clearly, a protein is very unlikely to have >200 simultaneous interactors.
• Some of the >200 are most likely false positives
• Some others are going to be mutually exclusive interactors (i.e. binding to the same interface).
Conclusion
• There appears to be an obvious discrepancy between >200 and 12.
ILLUSTRATIVEWouldn’t it be great to
be able to see the differentbinding interfaces?
20
UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES
Source: Kim et al. Science (in press)
ILLUSTRATIVE
PDB
Map all interactions to available homologous structures of interfaces
Distinguish overlapping from non-overlapping interfaces
21
SHORT DIGRESSION: THIS ALLOWS US TO DISTINGUISH SYSTEMATICALLY BETWEEN SIMULTANEOUSLY POSSIBLE AND MUTUALLY EXCLUSIVE INTERACTIONS
Simultaneouslypossible
interactions
Mutuallyexclusive
interactions
Source: Kim et al. Science (in press)
23
THAT IS HOW THE RESULTING NETWORK LOOKS LIKE
Source: PDB, Pfam, iPfam and Kim et al. Science (in press)
• Represents a “very high confidence” network
• Total of 873 nodes and 1269 interactions, each of which is structurally characterized
• 438 interactions are classified as mutually exclusive and 831 as simultaneously possible
• While much smaller than DIP, it is of similar size as other high-confidence datasets
The Structural Interaction Network (SIN) Properties
24
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 24
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
25
REMEMBER THE NETWORK PROPERTIES AS WE DESCRIBED BEFORE?
Source: Various, see following slides
Network topology
Network Evolution
Relationship of topology and genomic features
Examples of studies
• What distribution does the degree (number of interaction partners follow?)
• Does the network easily separate into more than one component?
• What is the relationship between the degree and a proteins essentiality?
• Is there a relationship between a proteins connectivity and expression profile?
• What is the relationship between a proteins evolutionary rate and its degree?
• How did the observed network topology evolve?
OVERVIEW
26
THERE DO NOT APPEAR TO BE THE KINDS OF REALLY BIG HUBS AS SEEN BEFORE – IS THE TOPOLOGY STILL SCALE-FREE?
Source: Kim et al. Science (in press)
• With the maximum number of interactions at 13, there are no “really big hubs” in this network
• Note that in other high-confidence datasets (or similar size), there are still proteins with a much higher degree
• The degree distribution appears to top out much earlier and less scale free than that of other networks
Degree distribution Properties
27
Entire genomeAll proteins
In our dataset
64.9%
31.8%32.3%15.1%
Single-interface hubs only
Multi-interface hubs only
Percentage ofessential proteins
IT’S REALLY ONLY THE MULTI-INTERFACE HUBS THAT ARE SIGNIFICANTLY MORE LIKELY TO BE ESSENTIAL
Source: Kim et al. Science (in press)
28
All proteinsIn our dataset
Single-interface hubs only
Multi-interface hubs only
ExpressionCorrelation
0.20.17
0.25
Expression correlation
DATE-HUBS AND PARTY-HUBS ARE REALLY SINGLE-INTERFACE AND MULTI-INTERFACE HUBS
Source: Han et al. Nature (2004) and Kim et al. Science (in press)
Frequency
29
AND ONLY MULTI-INTERFACE PROTEINS ARE EVOLVING SLOWER, SINGLE-INTERFACE HUBS DO NOT
Entire genomeAll proteins
In our datasetSingle-interface
hubs onlyMulti-interface
hubs only
EvolutionaryRate (dN/dS)
0.029
0.077
0.047 0.051
Source: Kim et al. Science (in press)
31
IN FACT, EVOLUTIONARY RATE CORRELATES BEST WITH THE FRACTION OF INTERFACE AVAILABLE SURFACE AREA
Source: Kim et al. Science (in press)
DATA IN BINS
Small portion of surface area involved in interfaces – fast evolving
Large portion of surface area involved in interfaces – slow evolving
32
IS THERE A DIFFERENCE BETWEEN SINGLE-INTERFACE HUBS AND MULTI-INTERFACE HUBS WITH RESPECT TO NETWORK EVOLUTION?
Source: Kim et al. Science (in press)
The Duplication Mutation Model In the structural viewpoint
If these models were correct,there would be an enrichment of
paralogs among B
33
0.00%
0.15%
0.07%
0.003%
Random pair
Same partner
Same partnerdifferent interface
Same partnersame interface
Fraction of paralogsbetween pairs of proteins
MULTI-INTERFACE HUBS DO NOT APPEAR TO EVOLVE BY A GENE DUPLICATION – THE DUPLICATION MUTATION MODEL CAN ONLY EXPLAIN THE EXISTENCE OF SINGLE-INTERFACE HUBS
Source: Kim et al. Science (in press)
But that also means that the duplication-mutation modelcannot explain the full current
interaction network!
34
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 34
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
35
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 35
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Target Genes
Transcription Factors
142 transcription factors
3,420 target genes
7,074 regulatory interactions
From integrating data from Snyder, Young, Kepes, and
TRANSFAC
Yeast Regulatory Network: a platform for integration
[Yu et al (2003), TIG]
36
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 36
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Determination of "Level" in Regulatory Network Hierarchy with
Breadth-first Search
37
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 37
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Yeast Regulatory Hierarchy: the Middle-managers Rule
38
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 38
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Example of Path Through Regulatory Network
39
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 39
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Yeast Network Similar in Structure to Government Hierarchy
with Respect to Middle-managers
40
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 40
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Yeast and E. coli Networks similar
in Structure
41
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 41
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
42
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 42
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Characteristics of Regulatory Hierarchy: Middle Managers are Information Flow
Bottlenecks
43
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 43
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Characteristics of Regulatory Hierarchy: The Paradox of Influence and Essentiality
44
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 44
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Characteristics of Regulatory Hierarchy:
Topmost proteins sit at center of protein
interaction networkAvg. Closeness
Le
vel
45
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 45
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
46
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 46
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
TopNet – an automated web tool
[Yu et al., 2004; Yip et al. (2005); Similar tools include Cytoscape.org, Idekar, Sander et al]
(vers. 2 :"TopNet-like
Yale Network Analyzer")
Normal website + Downloaded code (JAVA)+ Web service (SOAP) with Cytoscape plugin
47
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 47
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
SVGA visualization, Network Mgt. (Multiple Network Support, tagging with DB)
48
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 48
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Outline
• Background Why Study Networks? Interaction Networks and their properties
• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution 3-D structural point of view Network properties revisited
• Genomic analysis of the hierarchical structure of regulatory networks Construction Characteristics
• TopNet & tYNA
49
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 49
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Conclusions• 3D Analysis of Interaction Network
The topology of a direct physical interaction network is much less dominated by hubs than previously thought
Several genomic features that were previously thought to be correlated with the degree are in fact related to the number of interfaces and not the degree
Specifically, a proteins evolutionary rate appears to be dependent on the fraction of surface area involved in interactions rather than the degree
The current network growth model can only explain a part of currently known networks
• Regulatory Network Hierarchies Middle managers dominate, sitting at info. bottlenecks Paradox of influence and essentiality Topmost proteins sit at center of interaction network
50
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 50
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
MG
MS
AcknowledgementsAcknowledgements
TopNet.GersteinLab.org
51
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 51
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Acknowledgements
TopNet.GersteinLab.org
MS
MG
52
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 52
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Acknowledgements
TopNet.GersteinLab.org
MS
MG
53
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 53
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
5
Acknowledgements
TopNet.GersteinLab.org
MS
MG
H Yu
P KimK Yip
Y Xia
A Paccanaro
J Lu S Douglas
NIH, NSF, Keck