the relative vertex-to-vertex clustering value 1 a new criterion for the fast detection of...
TRANSCRIPT
1
The Relative Vertex-to-Vertex Clustering Value
A New Criterion for the Fast Detection of Functional Modules in Protein Interaction
Networks
Zina Mohamed Ibrahim
(King’s College, London, UK)
Alioune Ngom
(University of Windsor, Windsor, Canada)
2
Protein Complexes and Functional Modules
Protein complex: Proteins interacting with each other at the same time and place [Spirin et al. 2004]
Functional module: Set of proteins involved in a common elementary biological function
Bind each other at different time and place
Multiple protein complexes [Chen et al. 2005]
3
Identification of Functional Modules
Protein Interaction Networks (PINs) Functional modules correspond to highly connected sub-
graphs in a PIN Many graph clustering approaches
Clique-based methods: strict and not scalable to large PINs Density-based methods: issues with low-degree nodes and low
topological connectivity Hierarchical methods
Hierarchical organization of the modules within PINs Global metric: not scalable to large PINs Local metric: common misclassification of low-degree nodes Poor performance on noisy PINs; i.e., false positives interactions
4
Graph Clustering
Find non-overlapping communities in PINs
5
Hierarchical Methods -- Related Works
Divisive Approaches Iteratively remove an edge with the
Highest Edge Betweenness Score CNM method [Clauset et al 2004] O(m h logn)
Lowest Edge Clustering Coefficient Radicchi method [Radicchi et al 2004] O(m2)
These are global measures
)]1(),1min[(
)3(,)3(
,
vu
vuvu kk
ZC
6
Hierarchical Methods -- Related Works
Agglomerative Approaches:
Iteratively merge two clusters Cu and Cv
Edge Clustering Value:
Local similarity metric between nodes
HC-PIN Algorithm [Wang et al 2011]
||||
||),(
2
vu
vu
NN
NNvuECV
7
Our New Criterion – UnWeighted PINs
Relative Vertex-to-Vertex Clustering Value
0 ≤ R(u → v) ≤ 100 Likelihood of u to be in v’s cluster
Not how likely that both u and v lie in the same cluster Local similarity pre-metric Principle of preferential attachment in scale-free networks
u
vu
N
NNvuR 100)( }{aNN aa
8
Our New Criterion – Weighted PINs
Where,
w(x, y) = weight on interaction edge (x, y)
EbuNb
EauIa
w
u
vu
buw
auwvuR
),(;
),(;
),(
),(100)( ,
vuvu NNI ,
EyxVyx yxwx),(;
),( : of degree weighted
9
FAC-PIN Algorithm – Test for Inclusion
Insert u into Cv whenever
1. R(u → v) = 100
2. R(u → v) > R(v → u)
3. R(u → v) = R(v → u) and1. R(u → v) = R(v → u) = 100 or
2. R(u → v) > 50 That is whenever: R(u → v) > 50μ and R(u → v) ≥ R(v → u) Algorithm: for each v; iteratively insert its neighbors u
into Cv whenever test is true for u.
2
u
vu
NNN
10
FAC-PIN Algorithm - Clustering
Initialization Phase Form singleton cluster C(v) for each v
Community Detection Phase For each v, include each neighbor u into C(v) whenever
[ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ] is true
with merging parameter: 0 ≤ μ < 2
Partition Computation Phase Obtain the induced subgraph of G for each C(v) as sub-
network cluster
Evaluation Phase
11
FAC-PIN Algorithm - Clustering
12
Computational Complexities
Given n nodes and m edges
CNM Algorithm: O( m h logn ) h = height Radicchi Algorithm: O( m2 ) HC-PIN Algorithm: O( m δ2 ) FAC-PIN Algorithm: O( n δ2 ) << O( n D2 )
δ = average degree and D = maximum degree
13
Computational Experiments
For any given PIN:
1. Apply FAC-PIN with merging parameters μ
2. Evaluate modularity of resulting partitions Pk,μ
Three modularity functions
3. Pk = best Pk,μ
4. Execution time to obtain Pk,μ
5. Functional Enrichment validations with SGD GO P-value cutoff = 0.05 Retain significant clusters and number of significant clusters
14
Data Sets
8 un-weighted PIN data of from REACTOME database Including PIN data of S. cerevisiae (yeast SC-1) PIN data
5697 proteins 50675 interactions
1 un-weighted PIN and corresponding weighted PIN data of S. cerevisiae (yeast SC-2) from DIP database 4726 proteins 15166 interactions
Protein complexes from MIPS database
15
Results – Effect of Merging Parameter μ(SC-2; 4726 proteins and 15166 interactions)
• Recall: merging test = [ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ]
• Less neighbors are merged with v as μ increases, hence k increases with μ
16
Results – Execution Times in Seconds(PINs from Reactome database; μ = 0.5)
17
Results – Modularity Functions
Function Q:
Function Ω:
Function D:
where
w(u, v) = 0 or 1 for un-weighted PINs
k
iiiikw aePQ
1
2)(
k
iiiikw aeP
1
log)(
k
i i
iiiikw C
CCLCCLPD
1 ||
),(),()(
21 ,21 ),(),(SvSu
vuwSSL
),(
),(
VVL
CCLe ii
ii
),(
),(
VVL
VCLa i
i
18
Results – Modularity of FAC-PIN Partitions(PINs from Reactome database; μ = 0.5)
Qw
Ωw
Dw
19
Functional Module Prediction
Recall indicates how effectively proteins with the same functional category in the network are extracted
Precision illustrated how consistently proteins in
the same module are annotated
f-measure is used to evaluate the overall performance
Average f-measure as the accuracy of the algorithms
||
||
i
i
F
FCRE
||
||
C
FCPR i
PRRE
PRREFM
2
20
Functional Enrichment of FAC-PIN Modules
Hypergeometric distribution… …
21
Results – Functional Enrichment Validations(Un-weighted SC-1; 5697 proteins and 50675 interactions; μ = 0.5)