network biology, 2016, vol. 6, iss. 1
TRANSCRIPT
Network Biology
Vol. 6, No. 1, 1 March 2016
International Academy of Ecology and Environmental Sciences
Network Biology ISSN 2220-8879 ∣ CODEN NBEICS Volume 6, Number 1, 1 March 2016 Editor-in-Chief WenJun Zhang Sun Yat-sen University, China International Academy of Ecology and Environmental Sciences, Hong Kong E-mail: [email protected], [email protected] Editorial Board Ronaldo Angelini (The Federal University of Rio Grande do Norte, Brazil) Sudin Bhattacharya (The Hamner Institutes for Health Sciences, USA) Andre Bianconi (Sao Paulo State University (Unesp), Brazil) Danail Bonchev (Virginia Commonwealth University, USA) Graeme Boswell (University of Glamorgan, UK) Jake Chen (Indiana University-Purdue University Indianapolis, USA) Ming Chen (Zhejiang University, China) Daniela Cianelli (University of Naples Parthenope, Italy) Kurt Fellenberg (Technische Universitaet Muenchen, Germany) Alessandro Ferrarini (University of Parma, Italy) Vadim Fraifeld (Ben-Gurion University of the Negev, Israel) Alberto de la Fuente (CRS4, Italy) Mohamed Ragab Abdel Gawad (International University of Sarajevo, Bosnia and Herzegovina) Pietro Hiram Guzzi (University Magna Graecia of Catanzaro, Italy) Yongqun He (University of Michigan, USA) Shruti Jain (Jaypee University of Information Technology, India) Sarath Chandra Janga (University of Illinois at Urbana-Champaign, USA) Istvan Karsai (East Tennessee State University, USA) Caner Kazanci (University of Georgia, USA) Vladimir Krivtsov (Heriot-Watt University, UK) Miguel ángel Medina (Universidad de Málaga, Spain) Lev V. Nedorezov (Russian Academy of Sciences, Russia) Alexandre Ferreira Ramos (University of Sao Paulo, Brazil) Santanu Ray (Visva Bharati University, India) Dimitrios Roukos(Ioannina University School of Medicine, Greece) Ronald Taylor (Pacific Northwest National Laboratory,U.S. Dept of Energy, USA) Ezio Venturino (Universita’ di Torino, Italy) Jason Jianhua Xuan (Virginia Polytechnic Institute and State University, USA) Ming Zhan (National Institute on Aging, NIH, USA) TianShou Zhou (Sun Yat-Sen University, China) Editorial Office: [email protected]
Publisher: International Academy of Ecology and Environmental Sciences
Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong
Tel: 00852-2138 6086; Fax: 00852-3069 1955 Website: http://www.iaees.org/ E-mail: [email protected]
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
Article
A node degree dependent random perturbation method for prediction
of missing links in the network
WenJun Zhang
School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; International Academy of Ecology and Environmental Sciences, Hong Kong E-mail: [email protected], [email protected]
Received 25 September 2015; Accepted 27 October 2015; Published online 1 March 2016
Abstract
In present study, I proposed a node degree dependent random perturbation algorithm for prediction of missing
links in the network. In the algorithm, I assume that a node with more existing links harbors more missing
links. There are two rules. Rule 1 means that a randomly chosen node tends to connect to the node with greater
degree. Rule 2 means that a link tends to be created between two nodes with greater degrees. Missing links of
some tumor related networks (pathways) are predicted. The results prove that the prediction efficiency and
percentage of correctly predicted links against predicted missing links with the algorithm increases as the
increase of network complexity. The required number for finding true missing links in the predicted list
reduces as the increase of network complexity. Prediction efficiency is complexity-depedent only. Matlab
codes of the algorithm are given also. Finally, prospect of prediction for missing links is briefly reviewed. So
far all prediction methods based on static topological structure only (represented by adjacency matrix) seems
to be low efficient. Network evolution based, node similarity based, and sampling based (correlation based)
methods are expected to be the most promising in the future. Keywords missing links; network; rules; node degree; random perturbation; prediction; likelihood.
1 Introduction
Many biological networks (food webs, protein–protein interaction networks and metabolic networks, etc) are
incomplete networks due to missing links. For example, 80% of the molecular interactions in cells of Yeast
(Yu et al., 2008) and 99.7% interactions of human (Amaral, 2008) are unknown. An incomplete network
occurs due to our limited knowledge on the network, or the network is in evolution and thus more links or even
nodes are expected with time. Link (connection) prediction tries to estimate the likelihood of the existence of a
link between two nodes based on observed links and (or) the attributes of nodes (Zhang, 2015d; Zhou, 2015).
Link prediction can largely reduce the experimental costs for link finding. Also, link finding algorithms can be
used to predict the links that may appear in the future of evolving networks (Lü and Zhou, 2011; Lü et al.,
Network Biology ISSN 22208879 URL: http://www.iaees.org/publications/journals/nb/onlineversion.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml Email: [email protected] EditorinChief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
2012; Zhou, 2015). So far, numerous research on link prediction have been conducted (Clauset et al., 2008;
Guimera and Sales-Pardo, 2009; Barzel and Barabási, 2013; Bastiaens et al., 2015; Lü et al., 2015; Zhang,
2015b, 2015c, 2015d, 2016b; Zhang and Li, 2015; Zhao et al., 2015; Zhou, 2015). In present study, I will
propose an algorithm for prediction of missing links in the network, in which the likelihood of missing links of
a node depends on the node degree.
2 Methods
2.1 Algorithm
Link prediction is closely correlated with network evolution. Following the principle of network evolution of
Zhang’s model (Zhang, 2016a), in present algorithm I assume that a node with more existing links harbors
more missing links. It is a reasonable and practical assumption because new nodes tend to connect the nodes
with more links (Barabasi and Albert, 1999; Zhang, 2012a; Zhang, 2016a).
Assume there are totally v nodes in the network being predicted, and adjacency matrix of the network is
d=(dij), i, j=1,2,…,v, where dij=dji, dii=0, and if dij=1 or dji=1, there is a link (connection) between nodes i and
j. The adjacency matrix of the network for missing links only is D=(Dij), i, j=1,2,…,v. The procedures are as
follows
(1) Calculate the expected missing links to be predicted, m=m’per, where m’ is the total links of the
network, per is the perturbation rate, and per=0.2, 0.3, etc., which represents a percentage increment of links in
the network perturbation.
(2) Calculate the degree of node, ai(t), i=1,2,…,v. The cumulative attraction strength of node 1 to node i is
v
j
atj
i
j
atji
jj tatatp1
),(
1
),( )(/)()(
where is attraction factor, >0. For example, =1.2, 1.5, etc.
(3) Generate missing links. Let p0=0, and generate two random values w and u. For p0, p1, p2,…, pv, one of
the following two rules is used
Rule 1: if (j-1)/vwj/v, pk-1upk, kj, and dkj=djk=0, let Dkj=1 and Djk=1, i.e., there is a missing link
between nodes k and j.
Rule 2: if pj-1(t)wpj(t), pk-1(t)upk(t), kj, and dkj=djk=0, let Dkj=1 and Djk=1, i.e., there is a missing
link between nodes k and j.
Rule 1 means that a randomly chosen node tends to connect to the node with greater degree. Rule 2
means that a link tends to be created between two nodes with greater degrees. By doing so, a new link is
found. Repeat the procedure m times to produce m (missing) links. By doing so, an adjacency matrix of the
network for missing links only, D=(Dij), i, j=1,2,…,v, is generated.
(4) Return (3) to perform the next prediction, until the desired simulation times are achieved.
(5) Calculate mean number (likelihood) of predicted missing links, and rank the likelihood from greater to
smaller. The first m links are the predicted missing links with maximal likelihood.
The following are Matlab codes of the algorithm (linksPrediction.m)
%Reference: Zhang WJ. 2016. A node degree dependent random perturbation method for prediction of missing links in the
network. Network Biology, 6(1): 1-11
clear
choice=input('Input the type (1 or 2) of data file of the network from which missing links are ready to be predicted (1: adjacency
matrix; 2: two array): ');
2
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
disp('Adjacency matrix: d=(dij)m*m, where m is the number of nodes in the network. dij=1, if vi and vj are adjacent, and dij=0,
if vi and vj are not adjacent; i, j=1,2,…, m');
disp('Two array: there are two columns, A1 and A2, in the data file; an element of A1 stores a node of a link and the
corresponding element of A2 stores another node of the link. ');
if (choice==1)
adjstr=input('Input the file name of adjacency matrix from which missing links are ready to be predicted (e.g., raw.txt, raw.xls,
etc. Adjacency matrix is d=(dij)m*m, where m is the number of nodes in the network. dij=1, if vi and vj are adjacent, and dij=0,
if vi and vj are not adjacent; i, j=1,2,…, m: ','s');
end
if (choice==2)
adjstr=input('Input the file name of two array of the network from which missing links are ready to be predicted (e.g., raw.txt,
raw.xls, etc. There are two columns, A1 and A2, in the data file; an element of A1 stores a node of a link and the corresponding
element of A2 stores another node of the link: ','s');
end
rule=input('Input the rule type (1 or 2) used in the algorithm: ');
pro=input('Input perturbation rate to increase missing links of the network (e.g, 0.2, 0.3, etc.): ');
lamda=input('Attraction factor of nodes (lamda>0; e.g., 1.3, 1.5, etc.)= ');
simu=input('Input the simulation times (e.g, 100, 200, etc.): ');
if (choice==1) adjmat=load(adjstr); v=size(adjmat,2); end
if (choice==2)
twoarray=load(adjstr);
nn=size(twoarray,1);
v=max(max(twoarray));
for i=1:nn
adjmat(twoarray(i,1),twoarray(i,2))=1;
adjmat(twoarray(i,2),twoarray(i,1))=1;
end; end
degr=sum(adjmat);
m=round(sum(degr)/2*pro);
fprintf('\nAdjacency matrix of the original network\n')
disp([adjmat])
fprintf('\nNode degrees of adjacency matrix of the original network\n')
disp([degr])
fprintf(['\nMean of node degrees of the original network: ' num2str(mean(degr)) '\n\n'])
cnow=(sum(degr)/2)/((v^2-v)/2);
fprintf(['\nConnectance=' num2str(cnow) '\n'])
summ=sum(degr);
summa=sum(degr.*(degr-1));
h=v*summa/(summ*(summ-1));
fprintf(['\nAggregation index (AI) of node degrees=' num2str(h) '\n'])
cv=(std(degr))^2/mean(degr);
fprintf(['\nCoefficient of variation (CV) of node degrees=' num2str(cv) '\n'])
summ=v*(v-1)/2;
su=zeros(summ,2*simu);
prop=zeros(1,v);
3
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
proptot=zeros(v);
degrr=degr.^lamda;
prop(1)=degrr(1)/sum(degrr);
for i=2:v;
prop(i)=prop(i-1)+degrr(i)/sum(degrr);
end
for siml=1:simu
adj=zeros(v);
temp=zeros(m,2);
mm=1;
while (v>0)
rep=0;
while (v>0)
propp=prop;
if ((rep==0) & (rule==1))
for i=1:v;
propp(i)=i/v;
end; end
ran=rand();
for j=1:v
if (j==1) st=0; end
if (j>=2) st=propp(j-1); end
if ((ran>=st) & (ran<propp(j))) rep=rep+1; id(rep)=j; break; end
end
if ((rep>=2) & (id(rep)~=id(1)))
tab=0;
for i=1:mm
if (((id(1)==temp(i,1)) & (id(rep)==temp(i,2))) | ((id(rep)==temp(i,1)) & (id(1)==temp(i,2)))) tab=1; break; end
end
if (tab==1) continue; end;
temp(mm,1)=id(1); temp(mm,2)=id(rep);
break;
end; end
if (adjmat(id(1),id(rep))==0) adj(id(1),id(rep))=1; adj(id(rep),id(1))=1; mm=mm+1; end;
if (mm==m+1) break; end;
end
fprintf(['Simulation ' num2str(siml)])
fprintf('\n\nAdjacency matrix for predicted links only\n')
disp([adj])
[pairx,pairy]=find(adj);
temp1=pairx; temp2=pairy;
pairxs=pairx(temp1<temp2);
pairys=pairy(temp1<temp2);
ConnectionPairs=[pairxs pairys];
dm=size(ConnectionPairs,1);
4
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
su(:,siml*2-1)=[pairxs;zeros(summ-dm,1)]; su(:,siml*2)=[pairys;zeros(summ-dm,1)];
disp('Predicted links')
disp([ConnectionPairs])
end
disp('--------------------------------Summary---------------------------------')
disp(['There are totally ' num2str(sum(degr)/2) ' links in the original network'])
disp(['You wish to predict ' num2str(m) ' missing links in the original network'])
fprintf('\n');
proptot=zeros(v);
for i=1:v-1
for j=i+1:v
for k=1:simu
for l=1:v*(v-1)/2
if ((su(l,k*2-1)==i) & (su(l,k*2)==j)) proptot(i,j)=proptot(i,j)+1; proptot(j,i)=proptot(i,j); break; end
end; end; end; end
disp('Likelihood (mean number) of predicted links: ')
disp(' Node Node Likelihood')
s=0;
for j=1:v
for i=1:v
if (proptot(i,j)~=0) s=s+1;pairvalue(s)=proptot(i,j)/simu; end;
end; end
[pairx,pairy]=find(proptot);
result=[pairx pairy pairvalue'];
results(1,1)=result(1,1); results(1,2)=result(1,2); results(1,3)=result(1,3);
su=1;
for i=2:s
lab=0;
for j=1:i-1
if ((result(j,2)==result(i,1)) & (result(j,1)==result(i,2))) lab=1; break; end;
end
if (lab==0) su=su+1;results(su,1)=result(i,1); results(su,2)=result(i,2); results(su,3)=result(i,3); end
end
ires=sortrows(results,-3);
disp([ires])
2.2 Validation
In present study, I used the data of tumor related networks (pathways) (ABCAM, 2012; Huang and Zhang,
2012; Li and Zhang, 2013; Pathway Central, 2012; See supplementary material for adjacency matrices). These
networks are complete. For each network, some links are removed following reverse process of the algorithm
above and then predicted. The simulation times are set to be 100. The perturbation rate is per=0.25.
Attraction factor =1.5.
5
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
3 Results
3.1 Rule 1
Some of the summarized results for link prediction of tumor related networks (the pathways Ras, p53, Akt,
HGF, JNK, PPAR, TGF-β, and TNF) are listed in Table 1 and 2, and the percentages of correctly predicted
links with randomization method are given also. Here, the percentage of correctly predicted links against
number of missing links (%) = correctly predicted links / number of missing links 100, and the percentage
of correctly predicted links against predicted missing links (%) = correctly predicted links / total of predicted
missing links 100, connectance = number of observed links / number of possible maximum number of links.
Table 1 Link prediction of Ras, p53, and Akt networks with Rule 1 (per=0.25, =1.5, 100 simulations). The listed links are true links missed in the data used for predicting.
Ras p53 Akt
Rank Node Node Likelihood Rank Node Node Likelihood Rank Node Node Likelihood
28 9 5 0.04 82 47 32 0.04 465 35 31 0.01
34 28 5 0.04 138 47 33 0.03 151 50 12 0.02
58 22 5 0.03 140 47 36 0.03 2 51 15 0.18
137 10 5 0.02 88 48 47 0.04 1 51 16 0.2
140 25 5 0.02 11 52 4 0.07 26 51 24 0.1
230 31 28 0.02 61 52 9 0.04 17 51 28 0.12
392 35 34 0.01 4 52 10 0.09 28 51 31 0.1
269 52 30 0.02 36 51 38 0.07
18 52 48 0.07 10 51 39 0.14
19 52 51 0.07 31 51 41 0.09
7 51 42 0.15
20 52 51 0.12
According to Table 1 and 2, the regression relationships between aggregation index (u), coefficient of
variation (w) (Zhang and Zhan, 2011; Zhang, 2012a), and prediction efficiency (z=x/y, where x is the
percentages of correctly predicted links, and y is the averaged ranks before which all missing links fall in the
list of predicted links), the percentage (%) of correctly predicted links against predicted missing links (q), and
the rate of the averaged rank before which all missing links fall in the list of predicted links vs. total number
of predicted missing links (f) are as follows
Algorithm prediction:
z=0.320+0.344u r2=0.318, p=0.019<0.05, n=17
z=0.465+0.192w r2=0.323, p=0.017<0.05, n=17
q=1.349+0.243u r2=0.106, p=0.203, n=17
q=1.427+0.154u r2=0.139, p=0.141, n=17
f=0.438-0.125u r2=0.306, p=0.021<0.05, n=17
f=0.389-0.073w r2=0.341, p=0.014<0.05, n=17
Randomization prediction:
z=0.485-0.106u r2=0.149, p=0.125, n=17
z=0.445-0.063w r2=0.171, p=0.099<0.1, n=17
6
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
q=1.615-0.349u r2=0.259, p=0.038<0.05, n=17
q=1.451-0.182u r2=0.229, p=0.051<0.01, n=17
f=0.476-0.088u r2=0.156, p=0.117, n=17
f=0.436-0.046w r2=0.142, p=0.136, n=17
Thus prediction efficiency and the percentage of correctly predicted links against predicted missing links
with the algorithm increases as the increase of network complexity. Generally, the rate of averaged rank of
true missing links in the list of predicted missing links declines as the network complexity, which means the
required number for checking true missing links in the predicted list reduces as the increase of network
complexity.
Compared to the prediction of randomization method, in general, the results of the algorithm are effective,
i.e., the present algorithm is effective in predicting missing links of biological networks (Table 1, 2).
Both mean of node degrees and connectance have not significant relationships with prediction efficiency.
Thus prediction efficiency is complexity-depedent only.
Table 2 Link prediction of some tumor related networks of missing links with Rule 1 (per=0.25, =1.5).
PPAR TGF-β TNF STAT3 mTOR Ras EGF PTEN JAK-STAT
Mean of node degrees 1.85 1.79 2.06 1.75 1.83 1.71 1.96 2.06 2.09
Connectance 0.07 0.05 0.07 0.08 0.04 0.05 0.04 0.06 0.05
Possible maximum number of candidate links 326 669 433 255 993 565 1431 494 858
Aggregation Index (Zhang and Zhan, 2011;
Zhang, 2012a )
Coefficient of variation (Zhang and Zhan,
2011; Zhang, 2012a)
0.68
0.40
0.78
0.61
0.85
0.68
0.72
0.51
0.75
0.54
0.75
0.57
0.73
0.47
0.91
0.82
0.91
0.81
Percentage (%) of correctly predicted links
against true missing links with the
algorithm (x)
83.3 75.0 87.5 100 80.0 87.5 84.6 75.0 45.5
Percentage (%) of correctly predicted links
against predicted missing links with the
algorithm
1.9 1.3 2.0 2.6 1.4 1.8 1.3 1.7 0.9
Number of missing links 6 8 8 5 10 8 13 8 12
Total number of predicted links with 100
simulations 257 448 346 195 575 392 823 348 545
The averaged rank before which all missing
links fall in the list of predicted links (y) 115 190 179 114 202 127 432 47 99
Prediction efficiency (x/y) 0.7243 0.3947 0.4888 0.8772 0.396 0.689 0.1958 1.5957 0.4596
Percentage (%) of correctly predicted links
against true missing links with
randomization method (x)
100 75 87.5 60.0 70.0 37.5 61.5 100 45.5
Percentage (%) of correctly predicted links
against predicted missing links with
randomization method
2.2 1.3 1.9 1.4 1.1 0.7 0.9 2.0 0.8
Total number of predicted links with 100
simulations 270 466 375 217 651 424 853 398 617
The averaged rank before which all missing
links fall in the list of predicted links (y) 120 148 175 84 338 103 239 302 102
Prediction efficiency (x/y) 0.8333 0.5068 0.5 0.7143 0.2071 0.3641 0.2573 0.3311 0.4461
3.2 Rule 2
In the step (3) of the algorithm, I use the Rule 2 for prediction. The results for some pathways are listed in
Table 3. Compared to the Rule 1, the percentages (%) of correctly predicted links with the algorithm calculated
7
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
from the Rule 2 are overall smaller. However, the prediction efficiency of Rule 2 is generally higher. The
major regression relationships and conclusions are similar to Rule 1. Moreover, the prediction efficiency of the
algorithm increases dramatically as the network complexity.
Table 2 (continue) Link prediction of some tumor related networks of missing links with Rule 1 (per=0.25, =1.5).
p53 Akt HGF JNK PI3K MARK FAS ERK
Mean of node degrees 1.96 1.69 1.67 2.67 2.25 2.14 1.88 2.27
Connectance 0.04 0.03 0.05 0.06 0.04 0.04 0.04 0.04
Possible maximum number of candidate links 1275 1604 600 1064 1532 1591 1277 1702
Aggregation Index (Zhang and Zhan, 2011; ; Zhang, 2012a)
Coefficient of variation (Zhang and Zhan, 2011; Zhang,
2012a)
1.50
1.99
3.59
5.42
0.96
0.93
1.72
2.96
0.97
0.93
1.22
1.46
1.17
1.32
1.41
1.93
Percentage (%) of correctly predicted links against true
missing links with the algorithm (x) 76.9 100 57.1 100 68.8 53.3 75.0 94.1
Percentage (%) of correctly predicted links against
predicted missing links with the algorithm 1.6 2.2 1.1 2.6 1.2 0.9 1.4 1.8
Number of missing links 13 12 7 16 16 14 12 17
Total number of predicted links with 100 simulations 642 542 354 612 899 819 640 883
The averaged rank before which all missing links fall in
the list of predicted links (y) 64 66 102 93 240 202 80 179
Prediction efficiency (x/y) 1.2016 1.5152 0.5598 1.0753 0.2867 0.2639 0.9375 0.5257
Percentage (%) of correctly predicted links against true
missing links with randomization method (x) 61.5 25.0 71.4 75.0 68.8 66.7 75.0 76.5
Percentage (%) of correctly predicted links against predicted
missing links with randomization method 0.9 0.3 1.2 1.4 1.1 1.0 1.2 1.2
Total number of predicted links with 100 simulations 823 862 423 839 990 974 770 1073
The averaged rank before which all missing links fall in
the list of predicted links (y) 343 106 219 296 409 262 177 505
Prediction efficiency (x/y) 0.1793 0.2358 0.326 0.2534 0.1682 0.2546 0.4237 0.1515
Table 3 Link prediction of some tumor related networks (pathways) of missing links with Rule 2 (per=0.25, =1.5).
Ras p53 Akt HGF JNK PPAR TGF-β TNF
Percentage (%) of correctly predicted links with the algorithm (x) 62.5 92.3 50.0 57.1 75.0 33.3 37.5 62.5
Percentage (%) of correctly predicted links against predicted missing links
with the algorithm 0.7 0.9 0.2 1.5 1.2 1.8 1.5 1.7
Total number of predicted links with 100 simulations 314 388 301 300 404 221 304 291
The averaged rank before which all missing links fall in the list of
predicted links (y) 74 92 6 78 81 41 63 95
Prediction efficiency (x/y) 0.8446 1.0033 8.3333 0.7321 0.9259 0.8122 0.5952 0.6579
Percentage (%) of correctly predicted links against number of missing
links with random network (x) 37.5 53.9 16.7 85.7 62.5 83.3 87.5 75.0
Percentage (%) of correctly predicted links against predicted missing links
with random network 1.6 3.1 2.0 1.3 2.9 0.9 0.9 1.7
Total number of predicted links with 100 simulations 411 823 851 412 838 277 478 359
The averaged rank before which all missing links fall in the list of
predicted links (y) 77 325 23 213 246 55 184 111
Prediction efficiency (x/y) 0.487 0.1658 0.7261 0.4023 0.2541 1.5145 0.4755 0.6757
8
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
4 Discussion
As stated above, random prediction is overall effective for the random networks only. However, in practical
applications, most networks are complex networks. Thus the algorithm is effective in predicting missing links
in most cases. The prediction efficiency of the algorithm increases as the increase of network complexity.
Therefore, the algorithm is more efficient for the networks of higher complexity.
The changes of can reflect various effects of the node degree on connection mechanism. The larger
will lead to find more missing links of the nodes with greater node degree. 0 means a trend to random
prediction. How to fix a suitable value of , is specific to practical problems.
Lü et al. (2015) proposed the structural perturbation method (SPM) to predict missing links and argued that
its prediction ability was stronger than previous methods. However, I affim their method does not hold due to
the following reasons: (1) Mechanically, the structural perturbation method can only be used to analyze
structural stability of dynamic systems. The static structure of a network, expressed by an adjacent matrix, is
the topological structure, which cannot represent the dynamic charicteristics of the network evolution.
Pediction of missing links should be conducted on the basis of mechanism of network evolution (dynamics).
Without loss of generality, network evolution may be approximated with a group of linear differential
equations (Zhang, 2015a). And the structural stability of the network was determined by the eigenvalues but
not eigenvectors of system matrix. Even so, the structural perturbation method for determining the variables
with least impact on structural stability should only be used around the equilibrium states of the system rather
than the states far away the equilibrium. (2) During the evolution of a network, the generated links with most
likehood are not necessarily those links that minimaly perturb the topological structure of the network. On the
premise of not destroying the structural stability of the system and no other limitations, any links will prepare
to be created. A most occurred case is that two nodes with most similarity will firstly connect to each other. (3)
Utilization of missing links in the prediction model to predict missing links, as done by Lü et al. (2015), is
somewhat similar to model fitting but not prediction. In this case, the stronger “prediction” ability (precisely,
fitting ability) is surely expected.
So far all prediction methods based on static topological structure only (represented by adjacency matrix)
seems to be low efficient. Network evolution based (Zhang, 2012a, 2012c, 2015a, 2016a, 2016b), node
similarity based (Zhang, 2015d), and sampling based (correlation based; Zhang, 2007, 2011, 2012b, 2013,
2015b; Zhang and Li, 2015) methods are expected to be the most promising in the future.
Acknowledgment
We are thankful to the support of Discovery and Crucial Node Analysis of Important Biological and Social
Networks (2015.6-2020.6), from Yangling Institute of Modern Agricultural Standardization, High-Quality
Textbook Network Biology Project for Engineering of Teaching Quality and Teaching Reform of
Undergraduate Universities of Guangdong Province (2015.6-2018.6), from Department of Education of
Guangdong Province, and Project on Undergraduate Teaching Reform (2015.7-2017.7), from Sun Yat-sen
University, China.
References
ABCAM. 2012. http://www.abcam.com/index.html?pageconfig=productmap&cl=2282
Amaral LAN. 2008. A truer measure of our ignorance. Proceedings of the National Academy of Sciences of
USA, 105: 6795-6796
9
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
Barabasi AL, Albert R. 1999. Emergence of scaling in random networks. Science, 286(5439): 509
Barzel B, Barabási AL. 2013. Network link prediction by global silencing of indirect correlations. Nature
Biotechnology, 31: 720-725
Bastiaens P, Birtwistle MR, Blüthgen N, et al. 2015. Silence on the relevant literature and errors in
implementation. Nature Biotechnology, 33: 336-339
Clauset A, Moore C, Newman MEJ. 2008. Hierarchical structure and the prediction of missing links in
networks. Nature, 453: 98-101
Guimera R, Sales-Pardo M. 2009. Missing and spurious interactions and the reconstruction of complex
networks. Proceedings of the National Academy of Sciences of USA, 106: 22073-22078
Huang JQ, Zhang WJ. 2012. Analysis on degree distribution of tumor signaling networks. Network Biology,
2(3): 95-109
Li JR, Zhang WJ. 2013. Identification of crucial metabolites/reactions in tumor signaling networks. Network
Biology, 3(4): 121-132
Lü LY, Medo M, Yeung CH, et al. 2012. Recommender systems. Physics Reports, 519: 1-49
Lü LY, Pan LM, Zhou T, et al. 2015. Toward link predictability of complex networks. Proceedings of the
National Academy of Sciences of USA, 112: 2325-2330
Lü LY, Zhou T. 2011. Link prediction in complex networks: A survey. Physica A, 390: 1150-1170
Pathway Central. 2012. SABiosciences. http://www.sabiosciences.com/pathwaycentral.php
Yu HY, Braun P, Yildirim MA, et al. 2008. High-quality binary protein interaction map of the yeast
interactome network. Science, 322: 104-110
Zhang WJ. 2007. Computer inference of network of ecological interactions from sampling data.
Environmental Monitoring and Assessment, 124: 253-261
Zhang WJ. 2011. Constructing ecological interaction networks by correlation analysis: hints from community
sampling. Network Biology, 1(2): 81-98
Zhang WJ. 2012a. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,
Singapore
Zhang WJ. 2012b. How to construct the statistic network? An association network of herbaceous plants
constructed from field sampling. Network Biology, 2(2): 57-68
Zhang WJ. 2012c. Modeling community succession and assembly: A novel method for network evolution.
Network Biology, 2(2): 69-78
Zhang WJ. 2013. Construction of Statistic Network from Field Sampling. In: Network Biology: Theories,
Methods and Applications (WenJun Zhang, ed). 69-80, Nova Science Publishers, New York, USA
Zhang WJ, 2015a. A generalized network evolution model and self-organization theory on community
assembly. Selforganizology, 2(3): 55-64
Zhang WJ. 2015b. A hierarchical method for finding interactions: Jointly using linear correlation and rank
correlation analysis. Network Biology, 5(4): 137-145
Zhang WJ. 2015c. Calculation and statistic test of partial correlation of general correlation measures.
Selforganizology, 2(4): 65-77
Zhang WJ. 2015d. Prediction of missing connections in the network: A node-similarity based algorithm.
Selforganizology, 2(4): 91-101
Zhang WJ. 2016a. A random network based, node attraction facilitated network evolution method.
Selforganizology, 3(1): 1-9
Zhang WJ. 2016b. Selforganizology: The Science of Self-Organization. World Scientific, Singapore
Zhang WJ, Li X. 2015. Linear correlation analysis in finding interactions: Half of predicted interactions are
10
Network Biology, 2016, 6(1): 1-11
IAEES www.iaees.org
undeterministic and one-third of candidate direct interactions are missed. Selforganizology, 2(3): 39-45
Zhang WJ, Liu GH. 2012. Creating real network with expected degree distribution: A statistical simulation.
Network Biology, 2(3): 110-117
Zhang WJ, Zhan CY. 2011. An algorithm for calculation of degree distribution and detection of network type:
with application in food webs. Network Biology, 1(3-4): 159-170
Zhao J, Miao LL, Yang Y, et al. 2015. Prediction of links and weights in networks by reliable routes. Scientific
Reports, 5: 12261
Zhou T. 2015. Why link prediction? http://blog.sciencenet.cn/blog-3075-912975.html. Accessed on Aug 14,
2015
11
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Article
Centrality measures for immunization of weighted networks
Mohammad Khansari, Amin Kaveh, Zainabolhoda Heshmati, Maryam Ashkpoor Motlaq University of Tehran, Faculty of New Sciences and Technologies, Amir Abad, North Kargar Street 14395-1374, Tehran, Iran
E-mail: [email protected], [email protected],[email protected], [email protected]
Received 25 September 2015; Accepted 27 October 2015; Published online 1 March 2016
Abstract
Effective immunization of individual communities with minimal cost in vaccination has made great
discussion surrounding the realm of complex networks. Meanwhile, proper realization of relationship among
people in society and applying it to social networks brings about substantial improvements in immunization.
Accordingly, weighted graph in which link weights represent the intensity and intimacy of relationships is an
acceptable approach. In this work we employ weighted graphs and a wide variety of weighted centrality
measures to distinguish important individuals in contagion of diseases. Furthermore, we propose new
centrality measures for weighted networks. Our experimental results show that Radiality-Degree centrality is
satisfying for weighted BA networks. Additionally, PageRank-Degree and Radiality-Degree centralities
showmoreacceptable performance in targeted immunization of weighted networks.
Keywords centrality measure; epidemic threshold; largest connected components; targeted immunization;
weighted graphs.
1.
2.
3. Introduction
1 Introduction
Epidemic is 'the occurrence of more cases of disease than expected in a given area or among a specific group
of people over a particular period of time' (Parthasarathy, 2013). Nowadays, nearly 13 million people die
annually by infectious diseases which imposes high costs to society. Between 2009 and 2010, 14,000 people
lost their lives as a result of an influenza epidemic (Jain et al., 2009). Since 1981, AIDS is known as an
epidemic disease and according to estimates by WHO and UNAIDS only in 2013, it was the cause of death of
1.5 million people in the world. As a result, prediction and control of epidemics among populations has
attracted many researchers.
Immunization prevents diseases from spreading and saves many people from suffering and death
(Parthasarathy, 2013). Vaccination is one of the immunization techniques that protects the vaccinated person
as well as prevents the spread of disease, simultaneously (Cornforth et al., 2011). In traditional methods (mass
vaccination) the entire population will be vaccinated, which is not affordable due to high costs (Gallos et al.,
Network Biology ISSN 22208879 URL: http://www.iaees.org/publications/journals/nb/onlineversion.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml Email: [email protected] EditorinChief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
2007; Vidondo et al., 2012). A more effective method is targeted vaccination in which people are clustered by
some known criteria. In these categories people with more influence are identified and vaccinated (Chen et al.,
2008; Cornforth et al., 2011; Eames et al., 2009; Hartvigsen et al., 2007; Miller and Hyman, 2007; Peng et al.,
2010; Schneider et al., 2012; Shams and Khansari, 2014; Vidondo et al., 2012).The purpose of targeted
vaccination is that people are selected in such a way that unsafe clusters are as small as possible (Schneider et
al., 2012). In other words, the work is done by immunization with intercommunal individuals.
Complex networks has become a suitable tool for modeling, representing and describing the characteristics
of many natural phenomena (Zhgang and Zhan, 2011; Zhang, 2012). Among these networks, social networks
can be outlined in which each node represents an individual while communication between a pair of
individuals is represented as a link. As a result, many studies have been done on disease distributions and
immunization within these networks (Cornforth et al., 2011; Hartvigsen et al., 2007; Shams and Khansari,
2014). In these studies different centralities have been defined and nodes with high centrality have been
vaccinated and the result of the vaccination has been analyzed.
Although representation of social networks with simple graphs leads to suitable models, adopting
weighted networks can be more appropriate to include detailed information. In weighted social networks, the
quality of communication or relationship between a pair of nodes is considered as the weight of the edge
between them. Some works which have studied centrality measures in weighted networks are (Abbasi and
Hossain, 2013; Abbasi et al., 2011; Wang et al., 2008; Zhai et al., 2013).
Shams and Khansari have proposed an evaluation framework for targeted immunization algorithms in
which some centrality measures are utilized (Shams and Khansari, 2014). These centrality measures have been
used to distinguish effective nodes, though other meaningful and powerful ones are not considered.
To our knowledge, a complete study in capability of various centrality measures in detection of effective
nodes in weighted networks has not been done. Thus, we conduct a comprehensive study to determine the
effectiveness of different centrality measures in detecting important nodes within weighted networks. In this
study we propose new centrality measures while comparing them with the current existing measures. All the
evaluations are done on real and artificial networks.
The rest of this paper is organized as follows: section 2 presents current existing centrality measures for
weighted networks, following four new centrality measures which are introduced in section 3. Model
evaluation methods and analysis are introduced in section 4 while experimental results are presented in the
following section 5. Finally, Section 6 concludes this work.
2 Centrality Measures for Weighted Networks
Node centrality is one of the most analyzed concepts which determines the node effect on network flows.
Centrality of a node represents its prominence based on the definition of centrality. In a social network node
centrality exhibits the individuals’ communication pattern and his/her position in the network.
Leavitt defined this concept for the first time in 1951 (Leavitt, 1951)and Freeman defined three important
and applicable centralities: degree, closeness and betweenness in 1978 (Freeman, 1978). In the following
subsections current existing centrality measures are reviewed with emphasis on weighted networks.
2.1 Degree centrality
Degree centrality is a simple local centrality which is defined based on neighborhood concept. In a social
network it exhibits a node popularity and in our examinations it shows the node influence in spreading and
suppression of diseases. In weighted networks, a node degree centrality is the sum of the weights of the edges
13
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
attached to that node which is also nominated as the node strength. This measure represents the whole
involvement of a node in the network (Opsahl et al., 2010):
∑ , (1)
where, is the weighted adjacency matrix and , is the weight of edge which ties node to .
2.2 Distance-based Centralities
Other centralities are distance-based measures and have a close relationship with the concept of “path”in
weighted networks. In contrast to binary networks, different values can be assigned to different links in
weighted networks. Edge weight can represent cost (Dijkstra, 1959) as well as strength of connection
(Newman, 2001). In social networks edge weight represents strength of physical connection or sense of
intimacy between individuals. Hence, distance between two adjacent nodes is the inverse of the weight of their
corresponding link. Thus to calculate the geodesic path we use the method proposed in (Newman, 2001).
Distance-based centralities are as follows:
2.2.1 Closeness centrality
Closeness centrality is a global centrality which represents the independence of a node in the network
(Freeman, 1978). The most central node has the minimum sum of the ‘geodesic’ distances to all other nodes in
the network:
∑ , (2)
where, , is the weighted geodesic path between nodes and .
2.2.2 Betweenness centrality
Betweenness centrality represents the node’s ability to control the data flow in the network (Freeman, 1978).
This measure is the proportion of number of geodesic paths that pass through the given node to number of
geodesic paths between any pair of nodes in the network:
(3)
where, is the number of weighted geodesic paths which pass through node and is the number of
weighted geodesic paths between two nodes.
2.2.3 Radiality centrality
Radiality represents the extent of access into the network privided by the node's neighbors (Valente and
Foreman, 1998). High radiality means that it takes less time for the infectious node to reach others in the
network:
∑ ∆ , (4)
where, ∆ is the diameter of graph , is the set of nodes in graph , , is the weighted geodesic path
between nodes and and | | .
2.2.4 Katz centrality
Katz centrality is the generalization of degree centrality. In other words, a node’s degree centrality is the
number of its neighbors while Katz centrality is the number of nodes which are accessible through a specific
path. However, contribution of distant nodes is reduced:
14
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
∑ , . (5)
where, , is the element of row and column of weighted adjacency matrix. , are positive constants
where, is an attenuation factor,0 1(Katz, 1953). By 0, nodes with zero in-degree (in directed
networks) get positive centrality.
2.2.5 Subgraph centrality
This measure is based on spectral properties and characterizes the contribution of a node in different possible
subgraphs. Subgraph centrality of a node is the sum of closed paths with different lengths which launch from
and terminate in that node. Shorter path lengths have more influence in the calculations (Estrada and
Rodriguez, 2005):
∑!
(6)
where, is the number of closed walks of cost starting and ending at node .
2.2.6 Eccentricity centrality
A node’s eccentricity centrality is the inverse of the longest geodesic path from that node to other nodes.
Hence, a node with the smallest longest geodesic path is the most important node in network. In social
networks this centrality exhibits how distant, at most, is each person from others.
, (7)
where, , is the weighted shortest path between nodes and .
2.2.7 Communication centrality
Communication centrality represents the capability of a node to communicate with other nodes. This measure
depends on node’s degree, communication ability of neighbors and node’s edges weight. Communication
centrality of a node is defined based on multiplying the h-degree of a neighbor in the weight of the edge
connecting the node to that neighbor.
Zhai et al. defined the communication centrality of node as ''the largest integer such that the node
has at least neighbor nodes satisfying the product of each node’s h-degree and the weight of the edge linked
with node is no fewer than " (Zhai et al., 2013). If the h-degree of node’s neighbors are marked as ,
,..., and the node’s edge weight are , , …, then their product sequence is:
, , … ,
If we suppose:
…
Then,
max : (8)
2.3 Spectral centralities
Spectral centrality measures are those that their definition and calculation relies on the eigenvalues and the
eigenvectors of adjacency and Laplacian matrices of the graph. The most well-known two are:
2.3.1 Laplacian centrality
15
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Laplacian centrality is based on the Laplacian matrix. This matrix contains practical information about
dynamics and geometry of the network (Pauls and Remondini, 2012). Laplacian matrix is defined as:
where, is the graph’s degree matrix and is the graph’s weighted adjacency matrix. If we nominate
eigenvalues of as , ,..., the Laplacian energy of graph is:
A node’s Laplacian centrality is the difference between Laplacian energy of network with and without that
node (Qi et al., 2012):
, (9)
2.3.2 PageRank centrality
This centrality measure represents a node’s relative importance in the network (Sarma et al., 2011). The node’s
importance depends on the importance of its neighbors. PageRank centrality is based on links analysis:
. 1 (10)
where is known as damping factor that is usually set to 0.85. is the normalized weighted adjacency matrix
and is the preference network.
It is worth noting that Subgraph Centrality can be identified as spectral centrality.
2.4 Hybrid centralities
One of the major problems in pure centralities such as degree, betweenness and closeness centralities is that
they are not a convenient measure to estimating node availability. In other words, the degree centrality of a
node with neighbors with edge weight of 1 is equal to the degree centrality of a node with just 1 neighbor
with edge weight of . However, by removing an edge from each node the former is available and the latter is
not. Hence, hybrid centrality measures are proposed for better understanding of importance and influence of
nodes which are applicable to both weighted and binary networks.
2.4.1 Degree-Degree centrality
This measure highlights the nodes which have more connections to more important nodes. In other words, the
node degree as well as the degree of node’s neighbors are considered in calculation of this centrality measure.
∑ , . (11)
where, is the degree centrality of node and , is the weight of edge between nodes and .
2.4.2 Closeness-Degree centrality
This measure not only represents the node’s influence in control of data flow, but also represents the node’s
performance in communication with other nodes in the network.
∑ , . (12)
where, is the closeness centrality of node and , is the weight of edge between nodes and .
2.4.3 Betweenness-Degree centrality
16
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
This centrality represents popularity of a node as well as its influence in control of data flows in the
network.
∑ , . (13)
where, is the betweenness centrality of node and , is the weight of edge between nodes and .
3 The Proposed Centrality Measures
To detect the most effective nodes in the network and study vaccination of these nodes in the immunization
network, we propose four new hybrid centrality measures regarding the pattern proposed in (Abbasi et al.,
2011):
3.1 PageRank-Degree centrality
As aforementioned, PageRank centrality measure represents the importance of nodes based on their adjacent
nodes. We Introduce PageRank-Degree centrality as a combination of PageRank measure with degree
centrality. The following formula represents this new centrality measure:
∑ , . 1 ∑ (14)
where, , is the weight of the edge between nodes and , is the PageRank centrality of node ,
is damping factor with value of 0.85, is the set of nodes which have edge with node and is number
of outlinks of node .
3.2 Radiality-Degree centrality
Radiality centrality exhibits the degree of a node’s access to other nodes in the network through its neighbors.
Thus, if a node has more neighbors with high Radiality centrality, its Radiality centrality is high as well. We
define Radiality-Degree centrality measure as:
∑ , . ∑∆ ,
(15)
where, , is the weight of edge between nodes and , ∆ is the diameter of graph , is the set of nodes
of graph , is the number of nodes in the graph and , is the geodesic path between nodes and .
3.3 Subgraph-Degree centrality
We define subgraph-degree centrality as:
∑ , . ∑!
(16)
where, , is the weight of edge between nodes and , is the number closed walks of cost starting
and ending on node .
3.4 Katz-Degree centrality
Hybrid Katz-degree centrality is defined as follows:
∑ , . ∑ (17)
where, , is the weight of edge between nodes and . , are positive constants where, is attenuation
factor and 0 1.
17
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
4 Evaluation Method
The purpose of this study is to vaccinate the nodes with highest effect on disease distribution and remove them
from the network. Effective nodes will be selected by centrality measures which were discussed in the
previous section. By vaccinating these nodes the links in the network will be decreased therefore the network
is assorted into smaller elements. The framework of this model is the same as in (Shams and Khansari, 2014).
We use two criteria to estimate immunization algorithms: largest connected component (LCC) and epidemic
threshold.
4.1 Increment of epidemic threshold of network
Epidemic threshold of network means the minimum number of people who have to be infected to reach an
epidemic level. Higher epidemic threshold indicates lower probability to reach epidemic (Chakrabarti et al.,
2008; Kitchovitch and Lio, 2011; Masuda, 2009; Peng et al., 2010; Shams and Khansari, 2014). The epidemic
threshold of a network is the inverse of largest eigenvalue of network adjacency matrix (Chakrabarti et al.,
2008; Kitchovitch and Lio, 2011; Shams and Khansari, 2014). We study the influence of removal of a
specified node in reducing the largest eigenvalue of the network adjacency matrix. The purpose of this study is
to assess the capability of considered centrality measures to detect and immunize the most effective nodes. As
described in (Shams and Khansari, 2014), we calculate where is the largest eigenvalue of the
original network and is the largest eigenvalue of the vaccinated network.
4.2 Decrement of Largest-Connected-Component size
A Connected-Component is a subgraph in which there is at least one path between every two nodes. Thus, the
largest epidemic size of a network is its largest connected component (LCC) size (Chen et al., 2008; Gallos et
al., 2007; Schneider et al., 2012; Shams and Khansari, 2014). Hence, we calculate where is the
largest connected component of the original network and is the largest connected component of the
immunized network.
5 Experimental Results
In this section we investigate the performance of the predefined and proposed centrality measures in detecting
and immunizing influential nodes in the network. We calculate the centrality measures in three artificial
networks and in one real network and investigate their performance. To generate the artificial networks and to
calculate the different centrality measures we use R.3.1.1 software.
5.1 Datasets
To generate the artificial networks we use three most famous models: Scale-free, Erdős-Rényi model (ER) and
Small-World. Scale-free network is generated based on (Albert and Barabási, 2002) with 500 nodes and
parameter 3 which leads to 1500 edges. ER network is generated with 500 nodes and 1500 edges which
is greater than the threshold of connectedness of a random graph (Erdős and Rényi, 1961). Small-World
network is produced with 500 nodes and 3 initial neighbors in each side which result in 1500 edges and
rewiring probability is 0.1 (Watts and Strogatz, 1998). Edge weight is a random number between 1 and 20.
For each network and for each centrality measure we generate five datasets and calculate the average of
desired parameters. Then we use them in evaluation of immunization performance.
The real network includes Facebook-like (FBL) (Opsahl and Panzarasa, 2009) network which is frequently
used in immunization literature and consists of 1899 nodes and 13838 edges. The weight of the edges are
uniformly distributed between 1 and 20.
18
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
5.2 Diagnose and vaccination
Two methods are proposed to diagnose and vaccinate individuals based on their centrality in networks: the first
is initial method in which centrality measure is calculated in initial network and most central nodes will be
immunized regarding number of available vaccination resources. In the second method, adaptive method, in
each iteration the most central node will be immunized and centrality measure will be recalculated (Schneider
et al., 2012; Shams and Khansari, 2014). We determine and vaccinate individuals based on the initial method.
Our artificial networks consist of 500 nodes and in each iteration we immunize (remove) 10 nodes. The
utilized real network consist of 1890 nodes and in each iteration 40 individuals are vaccinated.
5.3 LCC in network models
Fig. 1 illustrates the efficiency of targeted immunization using LCC size in case of scale-free (BA), Small-
world (SW) and Random (ER) networks, regarding different centrality measures. We plotted the proportion of
LCC size to network size versus number of vaccinated (removed) nodes.Obviously, for BA model which is
presented by ● symbol, by vaccinating 50 as well as 100 individuals Communication centrality outperforms
others. Radiality-Degree, PageRank centralities represent the next best performance with almost same level.
On the other hand, Betweenness-Degree centrality is the worst strategy to immunize the weighted BA network.
Subgraph-Degree and Eccentricity are the next worst metrics. Our experiments show that in all centrality
measures except Betweenness-Degree, the worst expected epidemic size (LCC) decreases significantly in the
first 20 iterations (immunization of 40% of society). By using the Communication centrality, the proportion of
worst expected epidemic size (LCC) to network size is 0.04 when %20 individuals of network are immunized
which has the best result.
Considering different centrality measures to determine and immunize most effective nodes in weighted ER
networks is illustrated by ■ symbol. The worst expected epidemic size reduces considerably by using
Subgraph centrality. Communication and Radiality-Degree centralities have the next best performance. On the
contrary, Eccentricity and Degree-Degree Centralities exhibit the worst results.
Proportion of LCC size to network size versus vaccinated nodes regarding different centrality measures for
Wattz-Strogatz networks are represented by ▲ symbol. The Betweenness centrality outperforms others in
reducing LCC size. By using Betweenness centrality to detect and immunize the most important nodes, the
worst expected epidemic size will be reduced to one fifth of network size by vaccinating 40% of society. Table
1 summarizes our experiments in network models.Closeness-Degree and Radiality-Degree centralities are the
next best measures to immunize small-world networks. Take, for example, in order to diminish the size of
LCC to one fifth of network, we have to immunize 40%, 44% and 48% of the society by using Betweenness,
Closeness-Degree and Radiality-Degree respectively. Katz-Degree, Katz and Eccentricity centralities show the
worst results.
5.4 Epidemic threshold in network models
Effectiveness of different centrality measures to reduce the largest eigenvalue of network adjacency matrix
versus number of vaccinated individuals are illustrated in Fig. 2 in the case of scale-free (BA), Small-world
(SW) and Random (ER) networks.
19
IAEES
Fig. 1
1 Proportion of
f LCC size to in
Netwo
nital network siz
ork Biology, 20
ze versus numb
016, 6(1): 12-27
er of vaccinated
7
d nodes in BA,
ER and WS ne
www.iaees.org
etworks.
g
20
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Table 1 Centrality measures which have the best and worst performance in decreasing LCC size in network models.
Network Model Outperform Underperform
Barabási-Albert Communication Radiality-Degree
PageRank
Betweenness-Degree Subgraph-Degree
Eccentricity
Erdős-Rényi Subgraph
Communication Radiality-Degree
Eccentricity Degree-Degree
Watts-Strogatz Betweeness
Closeness-Degree Radiality-Degree
Katz-Degree Katz
Eccentricity
Proportion of largest eigenvalue of adjacency matrix of immunized network to largest eigenvalue of
original weighted BA network ( ) is illustrated by ● symbol. Radiality-Degree, PageRank-Degree and
Strength centralities represent the best performance respectively. On the other hand, Eccentricity, Subgraph-
Degree and Betweenness-Degree exhibit the worst.
Reducing the largest eigenvalue of immunized network versus immunized nodes is depicted by ■ symbol.
Obviously, Closeness-Degree, Degree-Degree and PageRank-Degree centralities have the most performance
while Eccentricity and Subgraph centralities underperform others.
Table 2 Centrality measures which have the best as well as the worst performance in increasing epidemic threshold in network models.
Network Model Outperform Underperform
Barabási-Albert Radiality-Degree PageRank-Degree
Strength
Eccentricity Subgraph-Degree
Betweenness-Degree
Erdős-Rényi Closeness-Degree
Degree-Degree PageRank-Degree
Eccentricity Subgraph
Watts-Strogatz Strength
Degree-Degree PageRank-Degree
Betweenness Radiality
Eccentricity
The same process has been done for Small-World (Wattz-Strogatz) networks which is illustrated by ▲
symbol. Our experiments show that by vaccination of 47% of society, the proportion of largest eigenvalue of
network to largest eigenvalue of initial network would be 0.3 by using Strength centrality which is the best
result. Degree-Degree and PageRank-Degree centralities exhibit the next best performance. Betweenness,
Radiality and Eccentricity show the worst results respectively. Table 2 summarizes our experiments in
increasing epidemic threshold in different network models.
21
IAEES
Fig. 2vacci
2 Proportion ofinated nodes in
f largest eigenvaBA, ER and W
Netwo
alue of immuniWS networks
ork Biology, 20
ized network to
016, 6(1): 12-27
largest eigenva
7
alue of initial n
etwork versus n
www.iaees.org
number of
g
22
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Fig. 3 Proportion of LCC size to initial network size versus number of vaccinated nodes in real network (FBL).
23
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Fig. 4 Proportion of largest eigenvalue of immunized network to largest eigenvalue of initial network versus number of
vaccinated nodes in real network (FBL).
24
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
5.5 Real network
As aforementioned we use the Facebook-like (FBL) (Opsahl and Panzarasa, 2009) as the real network. The
major drawback of this dataset is that it has a cybernetic essence and does not reflect individuals physical
contacts. However, due to lack of information in contact networks we used this dataset. Our experiments show
that Radiality-Degree centrality is the best measure to detect most influenced nodes. Communication and
PageRank centralities have the next best fulfillment as illustrated in Fig. 3.
To reduce the worst epidemic size to one fifth of the network size, we have to vaccinate %23, %23.7 and
24.3% of most important nodes by using Radiality-Degree, Communication and PageRank centralities,
respectively.
Fig. 4 illustrates the impact of targeted vaccination on reducing the largest eigenvalue of epidemic
threshold of proposed real network regarding different centrality measures. Our experiments show that
Closeness-Degree centrality has the best performance in increasing the network epidemic threshold. Degree-
Degree and Katz-Degree centralities show the next best results. By contrast, Eccentricity and Betweenness-
Degree centralities exhibit the worst results respectively.
6 Conclusions
Due to accuracy of weighted networks in representing physical contacts in society, we employed them in this
work. Hence, we took advantage of 13 predefined centrality measures for weighted networks to determine
prominent nodes as central nodes. Furthermore, we proposed 4 new hybrid centrality measures which weight
of links are considered in their computations. These 17 centrality measures were considered in targeted
immunization realm. We made a comparison between these centrality measures for eminent network models:
BA, ER and WS, as well as a real network (FBL) based on two metrics: epidemic threshold and largest
connected component of network.In case of increasing size of largest connected component, Radiality-Degree
centrality represented satisfying results as well as PageRank-Degree used to be applicable in increasing
epidemic threshold in all three models: BA, ER and WS. On the other hand, Subgraph-Degree centrality
exhibited poor performance among 4 proposed centrality measures. Moreover, Katz-Degree demonstrated
unacceptable outcomes in almost all networks although exhibited valuable results in increasing epidemic
threshold in real network (FBL).
Concisely, among pure centralities, Communication and Strength were applicable in decreasing LCC and
increasing epidemic threshold respectively. Moreover, Eccentricity presented entirely unacceptable results.
Among predefined hybrid centrality measures, (Degree-Degree, Betweenness-Degree and Closeness-Degree)
we concluded that Betweenness-Degree centrality is impractical in decreasing LCC. Results to the other two
measures were not so clear. Last but not least, for weighted BA network, Radiality-Degree and Betweenness-
Degree centralities represented best and worst performance respectively. No such unique conclusion was
inferred for the other two network models, i.e. Erdős-Rényi and Wattz-Strogatz.
Acknowledgment
We are thankfull to partiallysupportfrom Iran Telecommunication Research Center (ITRC).
25
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
References
Abbasi A, Hossain L. 2013. Hybrid centrality measures for binary and weighted networks. In Complex
networks. 1-7, Springer Berlin Heidelberg, Germany
Abbasi A, Altmann J, Hossain L. 2011. Identifying the effects of co-authorship networks on the performance
of scholars: A correlation and regression analysis of performance measures and social network analysis
measures. Journal of Informetrics, 5(4): 594-607
Albert R, Barabási AL. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):
47
Chakrabarti D, Wang Y, Wang C, Leskovec J, Faloutsos C. 2008. Epidemic thresholds in real networks. ACM
Transactions on Information and System Security, 10: 1-26
Chen Y, Paul G, Havlin S, et al. 2008. Finding a Better Immunization Strategy. Physical Review Letters, 101:
2-5
Cornforth DM, Reluga TC, Shim E, et al. 2011. Erratic flu vaccination emerges from short-sighted behavior in
contact networks. PLoS Computational Biology, 7: e1001062
Dijkstra EW. 1959. A note on two problems in connexion with graphs. Numerischemathematik, 1(1): 269-271
Eames KTD, Read JM, Edmunds WJ. 2009. Epidemic prediction and control in weighted networks. Epidemics,
1: 70-76
Erdős P, Rényi A. 1961. On the strength of connectedness of a random graph. Acta Mathematica Hungarica,
12(1-2): 261-267
Estrada E. Rodriguez-Velazquez JA. 2005. Subgraph centrality in complex networks. Physical Review E,
71(5): 056103
Freeman LC. 1978. Centrality in social networks conceptual clarification. Social Networks, 1: 215-239
Gallos L, Liljeros F, Argyrakis P, et al. 2007. Improving immunization strategies. Physical Review E, 75: 1-4
Hartvigsen G, Dresch JM, Zielinski AL, Macula AJ, et al. 2007. Network structure, and vaccination strategy
and effort interact to affect the dynamics of influenza epidemics. Journal of Theoretical Biology, 246: 205-
213
Jain S, Kamimoto L, Bramley AM, Schmitz AM, Benoit SR, Louie J, et al. 2009. Hospitalized patients with
2009 H1N1 influenza in the United States, April–June 2009. New England Journal of Medicine, 361(20):
1935-1944
Katz L. 1953. A new status index derived from sociometric analysis. Psychometrika, 18(1): 39-43
Kitchovitch S, Lio P. 2011. Community structure in social networks: applications for epidemiological
modelling. PloS one, 6: e22220.
Leavitt HJ. 1951. Some effects of certain communication patterns on group performance. The Journal of
Abnormal and Social Psychology, 46(1): 127-134
Masuda N. 2009. Immunization of networks with community structure. New Journal of Physics, 11: 123018.
Miller JC, Hyman JM. 2007. Effective vaccination strategies for realistic social networks. Physica A:
Statistical Mechanics and Its Applications, 386: 780-785
Newman ME. 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality.
Physical Review E, 64(1): 016132
Opsahl T, Panzarasa P. 2009. Clustering in weighted networks. Social Networks, 31: 155-163
Opsahl T, Agneessens F, Skvoretz J. 2010. Node centrality in weighted networks: Generalizing degree and
shortest paths. Social Networks, 32(3): 245-251
Parthasarathy A. 2013. Textbook of Pediatric Infectious Diseases. JP Medical Ltd, London, UK
26
Network Biology, 2016, 6(1): 12-27
IAEES www.iaees.org
Pauls SD, Remondini D. 2012. Measures of centrality based on the spectrum of the Laplacian. Physical
Review E, 85(6): 066127
Peng C, Jin X, Shi M. 2010. Epidemic threshold and immunization on generalized networks. Physica A:
Statistical Mechanics and Its Applications, 389: 549-560
Qi X, Fuller E, Wu Q, Wu Y, Zhang CQ. 2012. Laplacian centrality: A new centrality measure for weighted
networks. Information Sciences, 194: 240-253
Sarma AD, Gollapudi S, Panigrahy R. 2011. Estimating pagerank on graph streams. Journal of the ACM
(JACM), 58(3): 13
Schneider CM, Mihaljev T, Herrmann HJ. 2012. Inverse targeting —An effective immunization strategy.
Europhysics Letters, 98: 46002
Shams B, Khansari M. 2014. Using network properties to evaluate targeted immunization algorithms. Network
Biology, 74: 21
Valente TW, Foreman RK. 1998. Integration and radiality: measuring the extent of an individual's
connectedness and reachability in a network. Social networks, 20(1): 89-105
Vidondo B, Schwehm M, Bühlmann A, et al. 2012. Finding and removing highly connected individuals using
suboptimal vaccines. BMC Infectious Diseases, 12: 51
Wang H, Martin Hernandez J, Van Mieghem P. 2008. Betweenness centrality in a weighted network. Physical
Review E, 77: 046105
Watts D J, Strogatz S H. 1998. Collective dynamics of ‘small-world’ networks.nature, 393(6684): 440-442
Zhai L, Yan X, Zhang G. 2013. A centrality measure for communication ability in weighted network. Physica
A: Statistical Mechanics and its Applications, 392(23): 6107-6117
Zhang WJ. 2012. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,
Singapore
Zhang WJ, Zhan CY. 2011. An algorithm for calculation of degree distribution and detection of network type:
with application in food webs. Network Biology, 1(3-4): 159-170
27
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
Article
Investigation of common disease regulatory network for metabolic
disorders: A bioinformatics approach Tasnuba Jesmin1, Sajjad Waheed1, Abdullah-Al-Emran2
1Department of Information & Communication Technology, Mawlana Bhashani Science and Technology University, Santosh,
Tangail-1902, Bangladesh 2Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh,
Tangail-1902, Bangladesh
Email: [email protected]
Received 27 June 2015; Accepted 5 August 2015; Published online 1 March 2016
Abstract
Metabolic disorder causes the failure of metabolism process is growing concern worldwide. This research
predicts a common metabolic pathway that is shared by Obesity, Type-2 Diabetes, Hypertension and
Cardiovascular diseases due to metabolic disorder. A protein-protein interaction network is created to show the
protein co-expression, co-regulations and interactions among gene and diseases. Genes whose are associated
with metabolic diseases have been accumulated from different gene databases with verification and ‘mined’
them to establish gene interaction network models for expressing the molecular linkages among genes and
diseases which affect disease progression. The number of associated genes identified for Type 2 Diabetes (T2D)
is 250, Hypertension (HT) is 156, Obesity (OB) is 185 and cardiovascular disease (CVD) is 178.Among the
sorted candidate gene 10 common genes are identified whose are directly or indirectly associated with four
diseases by doing linkage filtering. By analysing the gene network model and PPI network a common
metabolic pathway among metabolic diseases has been investigated.
Key words data mining; metabolic disorders; metabolic diseases; PPI network; gene regulatory network.
1 Introduction
According to the WHO, overweight and obesity are among the five leading causes of deaths. Today 65% of the
world‘s population live in countries where overweight and obesity are responsible for more deaths than
underweight (World Health Organization, WHO) (World Diabetes, 2008). Overweight causes Obesity leads to
death due to Type 2 diabetes, Hypertension, Cardiovascular diseases and other metabolic disorder diseases.
Barness et al. (2007) reported that preventable cause of death worldwide is influenced by obesity where adults
and children are becoming more affected, and very few public health problems of the 21st century are as
Network Biology ISSN 22208879 URL: http://www.iaees.org/publications/journals/nb/onlineversion.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml Email: [email protected] EditorinChief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
viewed awful as obesity. On average, life expectancy shorten by six to seven years as a consequence of obesity
(Haslam and James, 2005; Peeters et al., 2003), life expectancy decreases by two to four years as a result of
BMI of 30-35 kg/m2, while severe obesity (BMI > 40 kg/m2) reduces life expectancy by ten years what is
proposed by another article (Whitlock et al., 2009). On the other hand there is a vital correlation between
obesity and type-2 diabetes (K. Ahmed et al., 2012).
The increasing level of Type 2 diabetes may remain in undetected for many years. In 2014, 9% of adults
18 years and older had diabetes. In 2012 diabetes was the direct cause of 1.5 million deaths. More than 80% of
diabetes deaths occur in low- and middle-income countries as like Bangladesh. Both lifestyle and genetic
factors have a role to initiate type 2-diabetes (Ripsinet al., 2009; Riserus et al., 2009). So type-2 diabetes and
metabolic disorder have a correlation. The authors Chobanian AV et al. proposed in their article at 2003 that
hypertension is a chronic metabolic disorder. Kearney PM et al. proposed at 2005 that both developed (333
million) and undeveloped (639 million) countries hypertension are general. A prospective cohort study (Gress
et al., 2000) proposed that T2D mellitus was almost 2.5 times higher to develop in subjects with hypertension
compare with normal blood pressure. The article (Cheung, 2010) showed that there is a strong correlation
between obesity and hypertension. They are proportionally increased or decreased.
In a multinational study, 50% of people with diabetes die of cardiovascular disease (primarily heart
disease and stroke) (Morrish et al., 2001). Kvan et al. (2007) and Norhammar et al. (2004) also proposed that
metabolic syndrome is a major risk factor to increase cardiovascular disease. Obesity and type 2 diabetes are
responsible to initiate metabolic syndrome (Isomaa et al., 2001). Now cardiovascular diseases are usually
connected with obesity and diabetes mellitus (Highlander and Shaw, 2010). Lastly obesity, type-2 diabetics
and CVD are correlated. Disease genes cause or contribute genetically to the development of the most complex
diseases. Gene alteration or their mechanical changes are mainly amenable to create disease. But for a
particular disease gene or a set of genes play a major role. Those are called disease genes. Proteins are
responsible for maintaining all cellular functions and their production is governed by the genetic code. A
disease may be the result of gene abnormality that causes any kind of changes in protein function. Therefore,
to establish network among gene it is very important to analyze the Characteristic of proteins and
understanding their function.
Klingstrom and Plewczynski (2011) described about the different types of Bio Informatics tool that are
helpful to show interaction, predict pathway and represent the interaction prediction among diseases, genes as
well as proteins. Kanehisa et al. (2008) provided detail information about KEGG database use and analysis.
Carretero and Oparil (2000) and Ashish Aneja et al. (2004) proposed that hypertension and obesity are strongly
interconnected, even that obesity is the main factor of causing hypertension. This paper has vast description
about hypertension and it stages. Type 2 diabetes and hypertension have common metabolic pathway during
the genetic level, known from article (Bernard et al., 2012). Microvascular and macrovascular complications of
diabetes are major causes of coronary heart diseases (CHD).The UNIHI tool what is used to predict PPI
network and Common metabolic pathway is referred by Kalathur et al. (2014). At July 2014 a research paper
has been published in Public health and proves that Adult obesity causes type-2 diabetes. Julien Dusonchet et
al. (2014) have focused that type 2 diabetes leads the cause of cardiovascular disease.
Over the last few years, there has been a growing concern in the study of biological interaction networks
at the genetic levels (Zeitoun et al., 2012; Rahman et al., 2013; Zhang, 2012; Zhang and Li, 2015). Identifying
basic structural relationships among the diseases, gene and protein is the main goal in the field of gene
regulatory interaction network. A community based approach gene regulatory network is established for
complete or incomplete topology using genes (Meyer et al., 2014). The authors proposed that Gene regulatory
networks (GRNs) regulate critical events during development of Cells (Wang et al., 2014). The paper
29
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
(Simo˜es-Costa et al., 2014) identifies new links in the gene regulatory network which is responsible for
development of this critical cell population. They also provide unprecedented characterization of themigratory
CNC transcriptome. Hayes and Dinkova-Kostova have designed an Nrf2 regulatory network. An interface
between intermediary metabolism and redox can be gained through Nrf2 regulatory network. JulienDusonchet
et al. have designed a Parkinson’s disease gene regulatory network and that is capable to find out LRRK2 gene
regulatory network model.
A paper (Ville-Petteri Ma¨kinen, 2014) proposed a Gene Networks for Coronary Artery Disease with
Molecular Pathways for Integrative Genomics. The paper has investigated the role of gene duplication for
creating gene network evolution (Teichmann and Babu, 2014). Apostolos Zaravinos et al. showed that there is
an associated network among deregulated genes for cohort of ccRCC tissues. They also suggest that these
genes are candidate predictive markers of the disease (Apostolos Zaravinos et al., 2014). Medaa et al. proposed
a genetic association’s mode network for psychotic bipolar disorder and schizophrenia. A miRNA-TF-gene
regulatory pathway in obesity is designed by Zhang et al. (2015).
Complex interactions among the cell’s numerous constituents such as protein, DNA, RNA and other small
molecules are responsible to create Biological functions. Thus, for it is important to assess interactions among
gene-gene, gene-protein and metabolic levels. The presented research work has applied a system in
bioinformatics approach for developing a gene interaction network model by taking high throughput genomic
and PPI data for those diseases.
According to the above discussion on obesity, type-2 diabetics, hypertension, cardiovascular disease, it is
visualized that they may be directly or indirectly interconnected with metabolic disorder. But is any common
pathway shared by obesity, type-2 diabetics, hypertension, and cardiovascular disease? The investigation
procedure and result is discussed in section 2 and 3 respectively.
2 Materials and Methods
There are some steps to accomplish a gene network topology which aids to establish a common pathway. Step
by step details description is shown below subsections through 2.1 to 2.2.7 respectively.
2.1 Data source
For the purpose of this research genes associated with Type 2 Diabetes, Hypertension, obesity and
Cardiovascular diseases are collected from PubMed. PubMed is the reliable and authentic storage for different
kind of genetic data. Those sources of PubMed are maintained by the NCBI (National Center of Biotechnology
Information) .The NCBI is freely accessible and downloadable gene database. GENE Bank data warehouse
and OMIM database are also used to collect the gene list for T2D, CVD, OB and HT.
2.2 Methods
2.2.1 Gene integration and processed
In this study, the disease genes are defined as the reported genes provided by the NCBI. The NCBI provides a
quality-controlled and literature-derived collection. The candidate genes are enlisted according to the specific
disease and merged the collected genes for each disease.
The collected genes are also verified using KEGG database. KEGG database resource consists of the
sixteen main databases. They are broadly categorized into systems information, genomic information and
chemical information and further subcategorized by color coding of web pages. KEGG pathway contains
around three lakhs entries for pathway maps built from around five hundreds manually drawn diagrams. After
the collection and integration the merged gene lists for each disease are processed individually to avoid
duplication and unnecessary genomic data.
30
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
2.2.2 Gene mining
Data mining is mainly used for making data appropriate for analysis and application. The listed candidate
genes related to the type-2 diabetes, obesity and hypertension and cardiovascular disease has been mined
according to metabolic disorder by using data mining technique and stored in Unigene data warehouse.
Unigene is primarily a database in NCBI. But it refers to cluster of genes that perform a particular function.
Due to the application of mining technique on gene this step is named as Gene Mining.
2.2.3 Gene identification according to disease
To identify the interrelated genes among metabolic diseases there is used EXPASY database which help to find
out genes those are not only related to diseases also have action for the cause of crashing metabolism process
of specific diseases.
2.2.4 Gene sorting
Sorting is the most crucial part of this research because any kind of tinny mistakes can remove the important
gene that may give wrong result. Sorting algorithm of Taxonomy database is used here to sort the identified
genes whose are internally correlated among obesity, T2D, hypertension and cardiovascular disease. The
common gene among theses 4 diseases has been determined those are directly or indirectly affect the each
disease.
2.2.5 Gene filtering
This is the critical step of this research and here is used UniHi tool. UniHI (Unified Human Interactome) is an
Omic tool Linkage Network filtering Technique is used to identify the common gene within the target diseases
T2D, OB, HT and CVD. UniHi tool is applied to find out those genes that have minimum binary and also
complex interaction among themselves. The investigated common genes are used to establish a PPI network.
2.2.6 PPI network creation
Protein-Protein interaction network plays an important role in bioinformatics research.PPI network also helps
to understand about the molecular mechanism of human diseases signaling pathways and to identify a new
module of disease processes. UniHi is now a very popular reliable bioinformatics tools to represent PPI maps
among genes. UniHI 7 currently includes almost 350 000 molecular interactions between genes, proteins and
drugs, as well as numerous other types of data such as gene expression and functional annotation. Finally PPI
network is created for common genes using UniHi tool.
2.2.7 Common regulatory pathway
Eight and final step is the construction of the common gene regulatory pathway. The common genes among
these diseases are verified with KEGG database in different biological pathways. The selected genes are
cross-validated and clustered using the KEGG mining tools (Kanehisa et al., 2008). From the pathway
information on the selected genes, there have predict common regulatory pathway using UniHI.
3 Results
A disease is rarely a consequence of an abnormality in a single gene directly, there are the effect of more than
one gene directly, indirectly even through proteomic level. The Different Bioinformatics tools establish a new
way to analysis the disease process by using gene and protein structure and interaction among themselves. The
trustable gene database permits free access to collect genetic data related to specific disease and the tools
advances the old version treatment by representing the deep level of genetic abnormality.
3.1 Gene collection, integration, mining and sorting
Collected responsible genes for target diseases (T2D, HT, OB and CVD) from PubMed, OMIM and Gene bank
database are merged and processed. The result shows responsible genes for T2D is 2794, HT is 1520, OB is
2531and CVD is 4713.Unigene Database is used here to mine the processed list of responsible gene. After
31
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
mining the responsible genes are reduced for each disease. The resultant responsible genes are now for T2D,
HT, OB and CVD are 250,156, 185 and 178 respectively. Table 1 shows the full description of resultant
responsible genes history of each sector. Among them the candidate genes are selected those are connected
with any one of the above mentioned diseases. The candidate genes are justified experimentally using KEGG
pathways. After analyzing the result the genes are picked out for T2D is 125, HT is 121, OB is 108 and CVD
is110.After passing the sorting stage the number of genes identified for Type 2 Diabetes, Hypertension,
Obesity, Cardiovascular diseases are 62 genes.
Table 1 Gene Collection chart for metabolic disorder and target disease according to Homo sapiens
3.2 Gene linkage filtering among T2D, OB, HT and CVD
Cross linkage is used here to investigate the molecular cross-talk within four interrelated diseases (like T2D,
HT, OB and CVD) mechanisms. The results of every cross linkage are shown in Table-2. The individual
disease networks were looked thoroughly at their ‘hubs’. Analysis of the gene patterns and their relatedness
within different diseases is down to collect genes of four diseases. Through the investigation of connecting
procedure and cross talk, a gene list is generated. The gene list contains all types of connections among four
diseases. During this process the common genes for all four diseases is found 10. These genes are NR3C1,
APOA1, APOB, CCL2, IL6, STAT3, NFKB1, LPL, PPARGC1A and TNF.
Table 2 Cross Linkage gene chart for metabolic disorder and target disease according to Homo sapiens.
Cross Linkage Between No of Gene Cross Linkage Among No. of Gene Common
Gene No.
CVD and OBS 110 CVD, OBS and HT 108
62
CVD and T2D 125 CVD, OBS and T2D 92
CVD and HT 108 OBS, T2D and HT 88
OBS and T2D 136 CVD, T2D and HT 90
OBS and HT 98 CVD, OBS, T2D and HT 70
T2D and HT 121
Name of Disease
Primary
Number of
Gene
Collection
Gene No.
for
Metabolic
Disorder
Gene No. for human and
Metabolic Disorder
Cardiovascular Disease 4713 208 178
Obesity 2531 222 185
Type2 Diabetes 2794 298 250
Hypertension 1520 191 156
32
IAEES
3.3 PPI
After ana
called Un
The PPI
genes an
gene or h
3.4 Com
To ident
confirma
least two
I network am
alyzing the b
niHI. UniHI
network aro
nd hub protein
hub protein. G
mmon regulat
tify the comm
ation. Commo
o hub genes.
Fig
mong commo
background st
can represen
ound commo
n. Some gene
Gene regulato
tory pathway
mon regulato
on regulatory
g. 1 PPI network
Network
on genes
tudies one Om
nt the protein
n genes is g
es are connect
ory model of
y
ory pathway
y pathway for
k among comm
k Biology, 2016
mic bioinform
-protein inter
given in Fig.
ted with each
each common
among comm
r each hub gen
mon genes repres
6, 6(1): 28-36
matics tool is
raction netwo
1.PPI netwo
h other direct
n gene is also
mon candida
nes are determ
sents the comm
s selected to
ork pattern am
ork represent
ly and some
o given in Fig
ate gene ther
mined, any p
mon regulatory p
w
go ahead of
mong the can
ts the relation
are connected
g. 1.
re is used thr
athway that i
pathway.
www.iaees.org
this research
ndidate gene.
nship among
d via another
ree steps for
is common at
h
.
g
r
r
t
33
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
3.4.1 Step-1: relationship among common genes
Common regulatory pathway around the common genes is outlined by UniHI in Figure-1.The RELA connects
the Candidate genes STAT3,NFKB1,CCL2, NR3C1 and IL6.NFKB1 also directly connected with SP1 and
CEPBP.NFKB1 is directly connected with IL6,CCl2 and NR3C1.NFKB1 and TNF are connected indirectly by
TAB2, IKBKB and IKBKG.PPARGC1A has direct connection with APOB through HNF4A.Again APOB has
indirect connection with both APOA1 and LPL.LCP1 gene keep the indirect communication using UBC with
NR3C1,NFKB1,STAT3,APOB and PPARGC1A.
3.4.2 Step-2: result validation of step-1 using GeneMANIA
By using GeneMANIA tools there is identified the pathway among those candidate gene which gives almost
same result as like step 1.A network is also generated with common metabolic pathway.NFKB1 connect five
genes CCL2,NR3C1,TNF,STAT3 and IL6.STAT3 has connection with LPL and LPL has connection with
APOB and both APOB and APOA1 has interrelationship. So the common pathway among NR3C1,APOA1,
APOB, CCL2, IL6, STAT3, NFKB1, LPL, PPARGC1A, TNF genes maintained in a cycle as like
APOA1-APOB-LPL-PPARGC1A-STAT3-TNF-NFKB1-CCL2-NR3C1-IL6.
3.4.3 Step-3: result validation of step-1 using UNIHI
The result of UNIHI tool is more specified. Among NR3C1, APOA1, APOB, CCL2, IL6, STAT3, NFKB1,
LPL, PPARGC1A, TNF genes the common metabolic pathway cycle cover NFKB1-PPARGC1A genes. Each
of these hub genes is connected to other candidate gene and keeps effect on the expression of responsible gene.
The result of every step in subsection 3.4.2 provides the same metabolic pathway. That’s validates the
proposed common metabolic pathway among T2D, OB, HT and CVD disease.
4 Discussion
Studies on the functional cross-links between gene associated diseases and specific disease are still in their
early stages and not well known much. Understanding the genetic mechanisms of diseases it is important to
know and analyze the Connections between genes and diseases. Both Candidate genes associated with Type-2
Diabetes, Hypertension, Cardiovascular disease, obesity and the metabolic diseases are topologically important
to construct a metabolic diseases network. Via mapping inter-genes to PPI, show the association among the
selected diseases through the genetic level there is constructed a cross talking sub pathway. A cross-talking sub
pathways network analysis gives a great performance capturing higher-level relationship among gene and
disease. The network-based analysis provides a rather than promising insight of a common metabolic path
between gene and disease.
Type-2 Diabetes, Obesity, Hypertension and Cardiovascular diseases cause due to the abnormality of
metabolism. To identify the interrelationship among these metabolic diseases have selected the genes related to
the diseases those have perfect biological relation to the specific disease. Cross linkage among metabolic
disease shows the relationship among them through gene level. Selection of a good set of gene can represent
an accurate Protein-Protein Interaction network among diseases and diseases genes. By mapping and analyzing
the PPI network common metabolic path has been investigated. By this Common metabolic pathway there is
established a metabolic diseases network which can regulate the expression of gene. This research is mainly
helpful to understand the metabolism network among genes and to target drug design
Abbreviations
T2D=Type-2 Diabetes; OB=Obesity; HT= Hypertension; CVD=Cardiovascular Disease
34
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
Acknowledgment
Financial Support has been given by ministry of Information and communication Technology, Bangladesh. I,
Tasnuba Jesmin, thank to all ICT ministry personnel, staffs for their supporting and special thanks to my
supervisors for their valuable guidance and insight, encouragement.
References
Ahmed K, Jesmin T, Fatima U, Moniruzzaman M, Emran AA., Rahman MZ. 2012. Intelligent and effective
diabetes risk prediction system using data mining. Oriental Journal of Computer Science and Technology,
5 (2): 215-221
Apostolos Zaravinos, et al. 2014. Altered metabolic pathways in clear cell renal cell carcinoma: A
meta-analysis and validation study focused on the deregulated genes and their associated networks.
Oncoscience, 1(2): 117-131
Ashish Aneja, et al. 2004.Hypertension and obesity. The Endocrine Society, 169-205
Barness LA, Opitz JM, Gilbert-Barness E, 2007. Obesity: genetic, molecular, and environmental aspects.
American Journal of Medical Genetics. 143: 3016–34.
Bernard MY, Cheung, Li C. 2012. Diabetes and hypertension: is there a common metabolic pathway? Current
Atherosclerosis Reports, 14: 160-166
Carretero OA, Oparil S. 2000. Essential hypertension: Part I: Definition and etiology. Circulation, 101:
329-335
Cheung BM. 2010. This is a brief review of the overlap between hypertension and type-2 diabetes that
proposes there is a spectrum ranging from hypertension without dysglycemia to type-2 diabetes without
elevated blood pressure. The hypertension-diabetes continuum. Journal of Cardiovascular Pharmacology,
55: 333-339
Chobanian AV, Bakris GL, Black HR, et al. 2003. Seventh report of the joint national committee on prevention,
detection, evaluation, and treatment of high blood pressure. Hypertension, 42: 1206-1252
Gress TW, Nieto FJ, Shahar E, et al. 2000. Hypertension and antihypertensive therapy as risk factors for type 2
diabetes mellitus. Atherosclerosis Risk in Communities Study. The New England Journal of Medicine,
342: 905-912
Haslam DW, James WP. 2005. Obesity. Lancet, 366: 1197-1209
Hayes JD, Dinkova-Kostova AT. 2014. The Nrf2 regulatory network provides an interface between redox and
intermediary metabolism. Trends in Biochemical Sciences, 39(4): 199-218
Highlander P, Shaw GP. 2010. Current pharmacotherapeutic concepts for the treatment of cardiovascular
disease in diabetics. Therapeutic Advances in Cardiovascular Disease, 4: 43-54
Isomaa B, Almgren P, Tuomi T, et al. 2001.Cardiovascular morbidity and mortality associated with the
metabolic syndrome. Diabetes Care, 24: 683-689
Julien Dusonchet, et al. 2014. A Parkinson’s disease gene regulatory network identifies the signaling protein
GS2 as a modulator of LRRK2 activity and neuronal toxicity A Parkinson’s disease gene regulatory
network identifies the signaling protein RGS2 as a modulator of LRRK2 activity and neuronal toxicity.
Human Molecular Genetics, 23(18): 4887-4905
Kalathur RK, Pinto JP, Hernández-Prieto MA, et al. 2014. UniHI 7: an enhanced database for retrieval and
interactive analysis of human molecular interaction networks. Nucleic Acids Research, 42: D408-D414
Kanehisa M, Araki M., Goto S, et al. 2008. KEGG for linking genomes to life and the environment. Nucleic
Acids Research, 36: D480-D484
Kearney PM, Whelton M, Reynolds K, et al. 2005. Global burden of hypertension: analysis of worldwide data.
35
Network Biology, 2016, 6(1): 28-36
IAEES www.iaees.org
Lancet, 365: 217-223
Klingstro TS, Plewczynski D. 2011. Protein-protein interaction and pathway databases, a graphical review.
Briefings in Bioinformatics, 12(6): 702-713
Kvan E, Pettersen KI, Sandvik L, et al. 2007. High mortality in diabetic patient with acute myocardial
infarction: cardiovascular co-morbidities contribute most to the high risk. International Journal of
Cardiology, 121: 184-188
Kanehisa M, Araki M, Goto S, et al. 2008. KEGG for linking genomes to life and the environment. Nucleic
Acids Research, 36: D480-D484
Meda SA, et al. 2014. Multivariate analysis reveals genetic associations of the resting default mode network in
psychotic bipolar disorder and schizophrenia. PNAS, E2066-E2075
Meyer P, et al. 2014. Network topology and parameter estimation: from experimental design methods to gene
regulatory network kinetics using a community based approach. BMC Systems Biology, 8: 1-13
Morrish NJ, Wang SL, Stevens LK, Fuller JH, Keen H. 2001. Mortality and causes of death in the WHO
multinational study of vascular disease in diabetes. Diabetologia, 44(2): S14-S21
Norhammar A, Malmberg K, Diderhol E, et al. 2004. Diabetes mellitus: the major risk factor in unstable
coronary artery disease even after consideration of the extent of coronary artery disease and benefits of
revascularization. Journal of the American College of Cardiology, 43: 585-591
Rahman KMT, Islam MdF, Banik RS, et al. 2013. Changes in protein interaction networks between normal and
cancer conditions: Total chaos or ordered disorder? Network Biology, 3(1): 15-28
Peeters A, Barendregt JJ, Willekens F, et al. 2003. Obesity in adulthood and its consequences for life
expectancy: A life-table analysis. Annals of Internal Medicine, 138: 24-32
Ripsin CM, Kang H, Urban RJ. 2009. Management of blood glucose in type 2 diabetes mellitus. American
Family Physician, 79: 29-36
Risérus U, Willett WC, Hu FB. 2009. Dietary fats and prevention of type 2 diabetes. Progress in Lipid
Research. 48: 44-51
Simo˜es-Costa M, Tan-Cabugao J, Antoshechkin I, et al. 2014. Transcriptome analysis reveals novel players in
the cranial neural crest gene regulatory network. Genome Research, 24: 281-290
Teichmann SA, Babu MM. 2014. Gene regulatory network growth by duplication. Nature Genetics, 36(5):
492-496
Ville-Petteri M, Civelek M, Meng QY, et al. 2014. Integrative Genomics Reveals Novel Molecular Pathways
and Gene Networks for Coronary Artery Disease. PLOS Genetics, 10(7): 1-14
Wang S, Sengel C, Emerson MM, et al. 2014. A gene regulatory network controls the binary fate decision of
rod and bipolar cells in the vertebrate retina. Developmental Cell, 30: 513-527
Whitlock G, Lewington S, Sherliker P, et al. 2009. Body-mass index and cause-specific mortality in 900 000
adults: collaborative analyses of 57 prospective studies. Lancet, 373: 1083-1096
World Health Organization. 2008. World Diabetes. Fact sheet N312. WHO
Zeitoun AH, Ibrahim SS, Bagowski, CP. 2012. Identifying the common interaction networks of amoeboid
motility and cancer cell metastasis. Network Biology, 2(2): 45-56
Zhang WJ. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,
Singapore, 2012
Zhang WJ, Li X. 2015. Linear correlation analysis in finding interactions: Half of predicted interactions are
undeterministic and one-third of candidate direct interactions are missed. Selforganizology, 2(3): 39-45
Zhang XM, Guo L, Chi MH, et al. 2015. Identification of active miRNA and transcription factor regulatory
pathways in human obesity-related inflammation. BMC Bioinformatics, 16(76): 1-7
36
Network Biology, 2016, 6(1): 37-39
IAEES www.iaees.org
Short Communication
Network chemistry, network toxicology, network informatics, and
network behavioristics: A scientific outline
WenJun Zhang
School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; International Academy of Ecology and
Environmental Sciences, Hong Kong
E-mail: [email protected], [email protected]
Received 18 July 2015; Accepted 26 August 2015; Published online 1 March 2016
Abstract
In present study, I proposed some new sciences: network chemistry, network toxicology, network informatics,
and network behavioristics. The aims, scope and scientific foundation of these sciences are outlined.
Keywords network chemistry; network toxicology; network informatics; network behavioristics; scientific
foundation; aims and scope; new sciences.
1 Introduction
Like network biology (Barabasi and Oltvai, 2004; Zhang, 2011, 2012; Find details on network biology at
http://www.iaees.org/publications/journals/nb/nb.asp), and network pharmacology (Hopkins, 2007, 2008),
network science has been successfully used in some areas of sciences. Using network theory and methodology
to improve traditional sciences is proved to be an effective approach. In present study, I try to propose some
new sciences, i.e., network chemistry, network toxicology, network informatics, and network behavioristics,
and to outline the aims, scope and scientific foundation of these sciences, in order to lay the foundation for
further studies in the future.
2 Proposed New Sciences
2.1 Network chemistry
The aims of network chemistry are to analyze interactions between chemicals/molecules/compounds in the
complex chemical networks, to study the evolution of chemical networks, and to control these chemical
networks, etc. At the level of molecular biology and biochemistry, some of the research scopes of network
chemistry coincide with that of network biology (Goemann et al., 2011; Huang and Zhang, 2012; Li and
Network Biology ISSN 22208879 URL: http://www.iaees.org/publications/journals/nb/onlineversion.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml Email: [email protected] EditorinChief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences
Network Biology, 2016, 6(1): 37-39
IAEES www.iaees.org
Zhang, 2013; Rahman et al., 2013). Graph theory, network science, systematics, chemistry, and computational
science, etc., are scientific foundation of network chemistry.
2.2 Network toxicology
In a sense, network toxicology is a branch of network chemistry. However, network toxicology further
considers environmental factors also. Network toxicology aims to study the mechanisms of toxicant’s flow in
the networks like ecosystems, etc., and to control the flow according to findings on these mechanisms. Graph
theory, network science, systematics, ecology/environmental sciences, health science, and computational
science, etc., are scientific foundation of network toxicology.
2.3 Network informatics
Network informatics aims to exploit the mechanisms of information dissemination in the networks like social
networks, ecosystems, the human brain, etc., and to improve the efficacy of information dissemination based
on findings on these mechanisms. Graph theory, network science, systematics, information science, and
computational science, etc., are all scientific foundation of network informatics.
2.4 Network behavioristics
Network behavioristics is closely related to sleforganizology and agent-based modeling (Zhang, 2012, 2013,
2014a, 2014b, 2016; Zhang and Liu, 2015). It aims to exploit the co-evolution mechanisms of behavioral rules
of nodes in the networks as human/animal communities, ecosystems, etc. Graph theory, network science,
systematics, agent-based modeling, selforganizology, and computational science, etc., are scientific foundation
of network behavioristics.
3 Summary
The present study just proposed the basic concepts and outlined the new sciences. The aims, scope, and
methodology, etc., of these sciences are expected to be further revised, improved and developed in the future.
Acknowledgment
We are thankful to the support of High-Quality Textbook Network Biology Project for Engineering of
Teaching Quality and Teaching Reform of Undergraduate Universities of Guangdong Province
(2015.6-2018.6), from Department of Education of Guangdong Province, and Discovery and Crucial Node
Analysis of Important Biological and Social Networks (2015.6-2020.6), from Yangling Institute of Modern
Agricultural Standardization.
References
Barabasi AL, Oltvai ZN. 2004. Network biology: understanding the cell's functional organization. Nature
Reviews Genetics, 5: 101-113
Goemann B, Wingender E, Potapov AP. 2011. Topological peculiarities of mammalian networks with
different functionalities: transcription, signal transduction and metabolic networks. Network Biology,
1(3-4): 134-148
Hopkins AL. 2007. Network pharmacology. Nature Biotechnology, 25(10): 1110-1111.
Hopkins AL. 2008. Network pharmacology: the next paradigm in durg discovery. Nature Chemical Biololgy,
4(11): 682-690
Huang JQ, Zhang WJ. 2012. Analysis on degree distribution of tumor signaling networks. Network Biology,
2(3): 95-109
38
Network Biology, 2016, 6(1): 37-39
IAEES www.iaees.org
Li JR, Zhang WJ. 2013. Identification of crucial metabolites/reactions in tumor signaling networks. Network
Biology, 3(4): 121-132
Rahman KMT, Md. Islam F, Banik RS, et al. 2013. Changes in protein interaction networks between normal
and cancer conditions: Total chaos or ordered disorder? Network Biology, 3(1): 15-28
Zhang WJ. 2011. Network Biology: an exciting frontier science. Network Biology, 1(1): 79-80
Zhang WJ. 2012. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,
Singapore
Zhang WJ. 2013. Selforganizology: A science that deals with self-organization. Network Biology, 3(1):1-14
Zhang WJ. 2014a. A framework for agent-based modeling of community assembly and succession.
Selforganizology, 1(1): 16-22
Zhang WJ. 2014b. Selforganizology: A more detailed description. Selforganizology, 1(1): 31-46
Zhang WJ. 2016. Selforganizology: The Science of Self-Organization. World Scientific, Singapore
Zhang WJ, Liu GH. 2015. Coevolution: A synergy in biology and ecology. Selforganizology, 2(2): 35-38
39
Network Biology
The Network Biology (ISSN 2220-8879; CODEN NBEICS) is an open access (BOAI definition),
peer/open reviewed online journal that considers scientific articles in all different areas of network
biology. It is the transactions of the International Society of Network Biology.It dedicates to the latest
advances in network biology. The goal of this journal is to keep a record of the state-of-the-art
research and promote the research work in these fast moving areas. The topics to be covered by
Network Biology include, but are not limited to:
Theories, algorithms and programs of network analysis
Innovations and applications of biological networks
Dynamics, optimization and control of biological networks
Ecological networks, food webs and natural equilibrium
Co-evolution, co-extinction, biodiversity conservation
Metabolic networks, protein-protein interaction networks, biochemical reaction networks,
gene networks, transcriptional regulatory networks, cell cycle networks, phylogenetic
networks, network motifs
Physiological networks
Network regulation of metabolic processes, human diseases and ecological systems
Social networks, epidemiological networks
System complexity, self-organized systems, emergence of biological systems, agent-based
modeling, individual-based modeling, neural network modeling, and other network-based
modeling.
Big data analytics of biological networks, etc.
We are also interested in short communications that clearly address a specific issue or completely
present a new ecological network, food web, or metabolic or gene network, etc.
Authors can submit their works to the email box of this journal, [email protected] and(or)
[email protected]. All manuscripts submitted to Network Biology must be previously unpublished
and may not be considered for publication elsewhere at any time during review period of this journal.
In addition to free submissions from authors around the world, special issues are also accepted. The
organizer of a special issue can collect submissions (yielded from a research project, a research group,
etc.) on a specific topic, or submissions of a conference for publication of special issue.
Editorial Office: [email protected]
Publisher: International Academy of Ecology and Environmental Sciences
Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong
Tel: 00852-2138 6086
Fax: 00852-3069 1955
E-mail: [email protected]
Network Biology ISSN 2220-8879 ∣ CODEN NBEICS
Volume 6, Number 1, 1 March 2016
Articles
A node degree dependent random perturbation method for prediction
of missing links in the network
WenJun Zhang 1-11
Centrality measures for immunization of weighted networks
Mohammad Khansari, Amin Kaveh, Zainabolhoda Heshmati, et al. 12-27
Investigation of common disease regulatory network for metabolic
disorders: A bioinformatics approach
Tasnuba Jesmin, Sajjad Waheed, Abdullah-Al-Emran 28-36
Short Communication
Network chemistry, network toxicology, network informatics, and
network behavioristics: A scientific outline
WenJun Zhang 37-39
IAEES http://www.iaees.org/