network biology, 2016, vol. 6, iss. 1

43
Network Biology Vol. 6, No. 1, 1 March 2016 International Academy of Ecology and Environmental Sciences

Upload: iaees

Post on 23-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Network Biology

Vol. 6, No. 1, 1 March 2016

International Academy of Ecology and Environmental Sciences

Network Biology ISSN 2220-8879 ∣ CODEN NBEICS Volume 6, Number 1, 1 March 2016 Editor-in-Chief WenJun Zhang Sun Yat-sen University, China International Academy of Ecology and Environmental Sciences, Hong Kong E-mail: [email protected], [email protected] Editorial Board Ronaldo Angelini (The Federal University of Rio Grande do Norte, Brazil) Sudin Bhattacharya (The Hamner Institutes for Health Sciences, USA) Andre Bianconi (Sao Paulo State University (Unesp), Brazil) Danail Bonchev (Virginia Commonwealth University, USA) Graeme Boswell (University of Glamorgan, UK) Jake Chen (Indiana University-Purdue University Indianapolis, USA) Ming Chen (Zhejiang University, China) Daniela Cianelli (University of Naples Parthenope, Italy) Kurt Fellenberg (Technische Universitaet Muenchen, Germany) Alessandro Ferrarini (University of Parma, Italy) Vadim Fraifeld (Ben-Gurion University of the Negev, Israel) Alberto de la Fuente (CRS4, Italy) Mohamed Ragab Abdel Gawad (International University of Sarajevo, Bosnia and Herzegovina) Pietro Hiram Guzzi (University Magna Graecia of Catanzaro, Italy) Yongqun He (University of Michigan, USA) Shruti Jain (Jaypee University of Information Technology, India) Sarath Chandra Janga (University of Illinois at Urbana-Champaign, USA) Istvan Karsai (East Tennessee State University, USA) Caner Kazanci (University of Georgia, USA) Vladimir Krivtsov (Heriot-Watt University, UK) Miguel ángel Medina (Universidad de Málaga, Spain) Lev V. Nedorezov (Russian Academy of Sciences, Russia) Alexandre Ferreira Ramos (University of Sao Paulo, Brazil) Santanu Ray (Visva Bharati University, India) Dimitrios Roukos(Ioannina University School of Medicine, Greece) Ronald Taylor (Pacific Northwest National Laboratory,U.S. Dept of Energy, USA) Ezio Venturino (Universita’ di Torino, Italy) Jason Jianhua Xuan (Virginia Polytechnic Institute and State University, USA) Ming Zhan (National Institute on Aging, NIH, USA) TianShou Zhou (Sun Yat-Sen University, China) Editorial Office: [email protected]

Publisher: International Academy of Ecology and Environmental Sciences

Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong

Tel: 00852-2138 6086; Fax: 00852-3069 1955 Website: http://www.iaees.org/ E-mail: [email protected]

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

Article

A node degree dependent random perturbation method for prediction

of missing links in the network

WenJun Zhang

School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; International Academy of Ecology and Environmental Sciences, Hong Kong E-mail: [email protected], [email protected]

Received 25 September 2015; Accepted 27 October 2015; Published online 1 March 2016

Abstract

In present study, I proposed a node degree dependent random perturbation algorithm for prediction of missing

links in the network. In the algorithm, I assume that a node with more existing links harbors more missing

links. There are two rules. Rule 1 means that a randomly chosen node tends to connect to the node with greater

degree. Rule 2 means that a link tends to be created between two nodes with greater degrees. Missing links of

some tumor related networks (pathways) are predicted. The results prove that the prediction efficiency and

percentage of correctly predicted links against predicted missing links with the algorithm increases as the

increase of network complexity. The required number for finding true missing links in the predicted list

reduces as the increase of network complexity. Prediction efficiency is complexity-depedent only. Matlab

codes of the algorithm are given also. Finally, prospect of prediction for missing links is briefly reviewed. So

far all prediction methods based on static topological structure only (represented by adjacency matrix) seems

to be low efficient. Network evolution based, node similarity based, and sampling based (correlation based)

methods are expected to be the most promising in the future. Keywords missing links; network; rules; node degree; random perturbation; prediction; likelihood.

1 Introduction

Many biological networks (food webs, protein–protein interaction networks and metabolic networks, etc) are

incomplete networks due to missing links. For example, 80% of the molecular interactions in cells of Yeast

(Yu et al., 2008) and 99.7% interactions of human (Amaral, 2008) are unknown. An incomplete network

occurs due to our limited knowledge on the network, or the network is in evolution and thus more links or even

nodes are expected with time. Link (connection) prediction tries to estimate the likelihood of the existence of a

link between two nodes based on observed links and (or) the attributes of nodes (Zhang, 2015d; Zhou, 2015).

Link prediction can largely reduce the experimental costs for link finding. Also, link finding algorithms can be

used to predict the links that may appear in the future of evolving networks (Lü and Zhou, 2011; Lü et al.,

Network Biology   ISSN 2220­8879   URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

2012; Zhou, 2015). So far, numerous research on link prediction have been conducted (Clauset et al., 2008;

Guimera and Sales-Pardo, 2009; Barzel and Barabási, 2013; Bastiaens et al., 2015; Lü et al., 2015; Zhang,

2015b, 2015c, 2015d, 2016b; Zhang and Li, 2015; Zhao et al., 2015; Zhou, 2015). In present study, I will

propose an algorithm for prediction of missing links in the network, in which the likelihood of missing links of

a node depends on the node degree.

2 Methods

2.1 Algorithm

Link prediction is closely correlated with network evolution. Following the principle of network evolution of

Zhang’s model (Zhang, 2016a), in present algorithm I assume that a node with more existing links harbors

more missing links. It is a reasonable and practical assumption because new nodes tend to connect the nodes

with more links (Barabasi and Albert, 1999; Zhang, 2012a; Zhang, 2016a).

Assume there are totally v nodes in the network being predicted, and adjacency matrix of the network is

d=(dij), i, j=1,2,…,v, where dij=dji, dii=0, and if dij=1 or dji=1, there is a link (connection) between nodes i and

j. The adjacency matrix of the network for missing links only is D=(Dij), i, j=1,2,…,v. The procedures are as

follows

(1) Calculate the expected missing links to be predicted, m=m’per, where m’ is the total links of the

network, per is the perturbation rate, and per=0.2, 0.3, etc., which represents a percentage increment of links in

the network perturbation.

(2) Calculate the degree of node, ai(t), i=1,2,…,v. The cumulative attraction strength of node 1 to node i is

v

j

atj

i

j

atji

jj tatatp1

),(

1

),( )(/)()(

where is attraction factor, >0. For example, =1.2, 1.5, etc.

(3) Generate missing links. Let p0=0, and generate two random values w and u. For p0, p1, p2,…, pv, one of

the following two rules is used

Rule 1: if (j-1)/vwj/v, pk-1upk, kj, and dkj=djk=0, let Dkj=1 and Djk=1, i.e., there is a missing link

between nodes k and j.

Rule 2: if pj-1(t)wpj(t), pk-1(t)upk(t), kj, and dkj=djk=0, let Dkj=1 and Djk=1, i.e., there is a missing

link between nodes k and j.

Rule 1 means that a randomly chosen node tends to connect to the node with greater degree. Rule 2

means that a link tends to be created between two nodes with greater degrees. By doing so, a new link is

found. Repeat the procedure m times to produce m (missing) links. By doing so, an adjacency matrix of the

network for missing links only, D=(Dij), i, j=1,2,…,v, is generated.

(4) Return (3) to perform the next prediction, until the desired simulation times are achieved.

(5) Calculate mean number (likelihood) of predicted missing links, and rank the likelihood from greater to

smaller. The first m links are the predicted missing links with maximal likelihood.

The following are Matlab codes of the algorithm (linksPrediction.m)

%Reference: Zhang WJ. 2016. A node degree dependent random perturbation method for prediction of missing links in the

network. Network Biology, 6(1): 1-11

clear

choice=input('Input the type (1 or 2) of data file of the network from which missing links are ready to be predicted (1: adjacency

matrix; 2: two array): ');

2

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

disp('Adjacency matrix: d=(dij)m*m, where m is the number of nodes in the network. dij=1, if vi and vj are adjacent, and dij=0,

if vi and vj are not adjacent; i, j=1,2,…, m');

disp('Two array: there are two columns, A1 and A2, in the data file; an element of A1 stores a node of a link and the

corresponding element of A2 stores another node of the link. ');

if (choice==1)

adjstr=input('Input the file name of adjacency matrix from which missing links are ready to be predicted (e.g., raw.txt, raw.xls,

etc. Adjacency matrix is d=(dij)m*m, where m is the number of nodes in the network. dij=1, if vi and vj are adjacent, and dij=0,

if vi and vj are not adjacent; i, j=1,2,…, m: ','s');

end

if (choice==2)

adjstr=input('Input the file name of two array of the network from which missing links are ready to be predicted (e.g., raw.txt,

raw.xls, etc. There are two columns, A1 and A2, in the data file; an element of A1 stores a node of a link and the corresponding

element of A2 stores another node of the link: ','s');

end

rule=input('Input the rule type (1 or 2) used in the algorithm: ');

pro=input('Input perturbation rate to increase missing links of the network (e.g, 0.2, 0.3, etc.): ');

lamda=input('Attraction factor of nodes (lamda>0; e.g., 1.3, 1.5, etc.)= ');

simu=input('Input the simulation times (e.g, 100, 200, etc.): ');

if (choice==1) adjmat=load(adjstr); v=size(adjmat,2); end

if (choice==2)

twoarray=load(adjstr);

nn=size(twoarray,1);

v=max(max(twoarray));

for i=1:nn

adjmat(twoarray(i,1),twoarray(i,2))=1;

adjmat(twoarray(i,2),twoarray(i,1))=1;

end; end

degr=sum(adjmat);

m=round(sum(degr)/2*pro);

fprintf('\nAdjacency matrix of the original network\n')

disp([adjmat])

fprintf('\nNode degrees of adjacency matrix of the original network\n')

disp([degr])

fprintf(['\nMean of node degrees of the original network: ' num2str(mean(degr)) '\n\n'])

cnow=(sum(degr)/2)/((v^2-v)/2);

fprintf(['\nConnectance=' num2str(cnow) '\n'])

summ=sum(degr);

summa=sum(degr.*(degr-1));

h=v*summa/(summ*(summ-1));

fprintf(['\nAggregation index (AI) of node degrees=' num2str(h) '\n'])

cv=(std(degr))^2/mean(degr);

fprintf(['\nCoefficient of variation (CV) of node degrees=' num2str(cv) '\n'])

summ=v*(v-1)/2;

su=zeros(summ,2*simu);

prop=zeros(1,v);

3

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

proptot=zeros(v);

degrr=degr.^lamda;

prop(1)=degrr(1)/sum(degrr);

for i=2:v;

prop(i)=prop(i-1)+degrr(i)/sum(degrr);

end

for siml=1:simu

adj=zeros(v);

temp=zeros(m,2);

mm=1;

while (v>0)

rep=0;

while (v>0)

propp=prop;

if ((rep==0) & (rule==1))

for i=1:v;

propp(i)=i/v;

end; end

ran=rand();

for j=1:v

if (j==1) st=0; end

if (j>=2) st=propp(j-1); end

if ((ran>=st) & (ran<propp(j))) rep=rep+1; id(rep)=j; break; end

end

if ((rep>=2) & (id(rep)~=id(1)))

tab=0;

for i=1:mm

if (((id(1)==temp(i,1)) & (id(rep)==temp(i,2))) | ((id(rep)==temp(i,1)) & (id(1)==temp(i,2)))) tab=1; break; end

end

if (tab==1) continue; end;

temp(mm,1)=id(1); temp(mm,2)=id(rep);

break;

end; end

if (adjmat(id(1),id(rep))==0) adj(id(1),id(rep))=1; adj(id(rep),id(1))=1; mm=mm+1; end;

if (mm==m+1) break; end;

end

fprintf(['Simulation ' num2str(siml)])

fprintf('\n\nAdjacency matrix for predicted links only\n')

disp([adj])

[pairx,pairy]=find(adj);

temp1=pairx; temp2=pairy;

pairxs=pairx(temp1<temp2);

pairys=pairy(temp1<temp2);

ConnectionPairs=[pairxs pairys];

dm=size(ConnectionPairs,1);

4

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

su(:,siml*2-1)=[pairxs;zeros(summ-dm,1)]; su(:,siml*2)=[pairys;zeros(summ-dm,1)];

disp('Predicted links')

disp([ConnectionPairs])

end

disp('--------------------------------Summary---------------------------------')

disp(['There are totally ' num2str(sum(degr)/2) ' links in the original network'])

disp(['You wish to predict ' num2str(m) ' missing links in the original network'])

fprintf('\n');

proptot=zeros(v);

for i=1:v-1

for j=i+1:v

for k=1:simu

for l=1:v*(v-1)/2

if ((su(l,k*2-1)==i) & (su(l,k*2)==j)) proptot(i,j)=proptot(i,j)+1; proptot(j,i)=proptot(i,j); break; end

end; end; end; end

disp('Likelihood (mean number) of predicted links: ')

disp(' Node Node Likelihood')

s=0;

for j=1:v

for i=1:v

if (proptot(i,j)~=0) s=s+1;pairvalue(s)=proptot(i,j)/simu; end;

end; end

[pairx,pairy]=find(proptot);

result=[pairx pairy pairvalue'];

results(1,1)=result(1,1); results(1,2)=result(1,2); results(1,3)=result(1,3);

su=1;

for i=2:s

lab=0;

for j=1:i-1

if ((result(j,2)==result(i,1)) & (result(j,1)==result(i,2))) lab=1; break; end;

end

if (lab==0) su=su+1;results(su,1)=result(i,1); results(su,2)=result(i,2); results(su,3)=result(i,3); end

end

ires=sortrows(results,-3);

disp([ires])

2.2 Validation

In present study, I used the data of tumor related networks (pathways) (ABCAM, 2012; Huang and Zhang,

2012; Li and Zhang, 2013; Pathway Central, 2012; See supplementary material for adjacency matrices). These

networks are complete. For each network, some links are removed following reverse process of the algorithm

above and then predicted. The simulation times are set to be 100. The perturbation rate is per=0.25.

Attraction factor =1.5.

5

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

3 Results

3.1 Rule 1

Some of the summarized results for link prediction of tumor related networks (the pathways Ras, p53, Akt,

HGF, JNK, PPAR, TGF-β, and TNF) are listed in Table 1 and 2, and the percentages of correctly predicted

links with randomization method are given also. Here, the percentage of correctly predicted links against

number of missing links (%) = correctly predicted links / number of missing links 100, and the percentage

of correctly predicted links against predicted missing links (%) = correctly predicted links / total of predicted

missing links 100, connectance = number of observed links / number of possible maximum number of links.

Table 1 Link prediction of Ras, p53, and Akt networks with Rule 1 (per=0.25, =1.5, 100 simulations). The listed links are true links missed in the data used for predicting.

Ras p53 Akt

Rank Node Node Likelihood Rank Node Node Likelihood Rank Node Node Likelihood

28 9 5 0.04 82 47 32 0.04 465 35 31 0.01

34 28 5 0.04 138 47 33 0.03 151 50 12 0.02

58 22 5 0.03 140 47 36 0.03 2 51 15 0.18

137 10 5 0.02 88 48 47 0.04 1 51 16 0.2

140 25 5 0.02 11 52 4 0.07 26 51 24 0.1

230 31 28 0.02 61 52 9 0.04 17 51 28 0.12

392 35 34 0.01 4 52 10 0.09 28 51 31 0.1

269 52 30 0.02 36 51 38 0.07

18 52 48 0.07 10 51 39 0.14

19 52 51 0.07 31 51 41 0.09

7 51 42 0.15

20 52 51 0.12

According to Table 1 and 2, the regression relationships between aggregation index (u), coefficient of

variation (w) (Zhang and Zhan, 2011; Zhang, 2012a), and prediction efficiency (z=x/y, where x is the

percentages of correctly predicted links, and y is the averaged ranks before which all missing links fall in the

list of predicted links), the percentage (%) of correctly predicted links against predicted missing links (q), and

the rate of the averaged rank before which all missing links fall in the list of predicted links vs. total number

of predicted missing links (f) are as follows

Algorithm prediction:

z=0.320+0.344u r2=0.318, p=0.019<0.05, n=17

z=0.465+0.192w r2=0.323, p=0.017<0.05, n=17

q=1.349+0.243u r2=0.106, p=0.203, n=17

q=1.427+0.154u r2=0.139, p=0.141, n=17

f=0.438-0.125u r2=0.306, p=0.021<0.05, n=17

f=0.389-0.073w r2=0.341, p=0.014<0.05, n=17

Randomization prediction:

z=0.485-0.106u r2=0.149, p=0.125, n=17

z=0.445-0.063w r2=0.171, p=0.099<0.1, n=17

6

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

q=1.615-0.349u r2=0.259, p=0.038<0.05, n=17

q=1.451-0.182u r2=0.229, p=0.051<0.01, n=17

f=0.476-0.088u r2=0.156, p=0.117, n=17

f=0.436-0.046w r2=0.142, p=0.136, n=17

Thus prediction efficiency and the percentage of correctly predicted links against predicted missing links

with the algorithm increases as the increase of network complexity. Generally, the rate of averaged rank of

true missing links in the list of predicted missing links declines as the network complexity, which means the

required number for checking true missing links in the predicted list reduces as the increase of network

complexity.

Compared to the prediction of randomization method, in general, the results of the algorithm are effective,

i.e., the present algorithm is effective in predicting missing links of biological networks (Table 1, 2).

Both mean of node degrees and connectance have not significant relationships with prediction efficiency.

Thus prediction efficiency is complexity-depedent only.

Table 2 Link prediction of some tumor related networks of missing links with Rule 1 (per=0.25, =1.5).

PPAR TGF-β TNF STAT3 mTOR Ras EGF PTEN JAK-STAT

Mean of node degrees 1.85 1.79 2.06 1.75 1.83 1.71 1.96 2.06 2.09

Connectance 0.07 0.05 0.07 0.08 0.04 0.05 0.04 0.06 0.05

Possible maximum number of candidate links 326 669 433 255 993 565 1431 494 858

Aggregation Index (Zhang and Zhan, 2011;

Zhang, 2012a )

Coefficient of variation (Zhang and Zhan,

2011; Zhang, 2012a)

0.68

0.40

0.78

0.61

0.85

0.68

0.72

0.51

0.75

0.54

0.75

0.57

0.73

0.47

0.91

0.82

0.91

0.81

Percentage (%) of correctly predicted links

against true missing links with the

algorithm (x)

83.3 75.0 87.5 100 80.0 87.5 84.6 75.0 45.5

Percentage (%) of correctly predicted links

against predicted missing links with the

algorithm

1.9 1.3 2.0 2.6 1.4 1.8 1.3 1.7 0.9

Number of missing links 6 8 8 5 10 8 13 8 12

Total number of predicted links with 100

simulations 257 448 346 195 575 392 823 348 545

The averaged rank before which all missing

links fall in the list of predicted links (y) 115 190 179 114 202 127 432 47 99

Prediction efficiency (x/y) 0.7243 0.3947 0.4888 0.8772 0.396 0.689 0.1958 1.5957 0.4596

Percentage (%) of correctly predicted links

against true missing links with

randomization method (x)

100 75 87.5 60.0 70.0 37.5 61.5 100 45.5

Percentage (%) of correctly predicted links

against predicted missing links with

randomization method

2.2 1.3 1.9 1.4 1.1 0.7 0.9 2.0 0.8

Total number of predicted links with 100

simulations 270 466 375 217 651 424 853 398 617

The averaged rank before which all missing

links fall in the list of predicted links (y) 120 148 175 84 338 103 239 302 102

Prediction efficiency (x/y) 0.8333 0.5068 0.5 0.7143 0.2071 0.3641 0.2573 0.3311 0.4461

3.2 Rule 2

In the step (3) of the algorithm, I use the Rule 2 for prediction. The results for some pathways are listed in

Table 3. Compared to the Rule 1, the percentages (%) of correctly predicted links with the algorithm calculated

7

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

from the Rule 2 are overall smaller. However, the prediction efficiency of Rule 2 is generally higher. The

major regression relationships and conclusions are similar to Rule 1. Moreover, the prediction efficiency of the

algorithm increases dramatically as the network complexity.

Table 2 (continue) Link prediction of some tumor related networks of missing links with Rule 1 (per=0.25, =1.5).

p53 Akt HGF JNK PI3K MARK FAS ERK

Mean of node degrees 1.96 1.69 1.67 2.67 2.25 2.14 1.88 2.27

Connectance 0.04 0.03 0.05 0.06 0.04 0.04 0.04 0.04

Possible maximum number of candidate links 1275 1604 600 1064 1532 1591 1277 1702

Aggregation Index (Zhang and Zhan, 2011; ; Zhang, 2012a)

Coefficient of variation (Zhang and Zhan, 2011; Zhang,

2012a)

1.50

1.99

3.59

5.42

0.96

0.93

1.72

2.96

0.97

0.93

1.22

1.46

1.17

1.32

1.41

1.93

Percentage (%) of correctly predicted links against true

missing links with the algorithm (x) 76.9 100 57.1 100 68.8 53.3 75.0 94.1

Percentage (%) of correctly predicted links against

predicted missing links with the algorithm 1.6 2.2 1.1 2.6 1.2 0.9 1.4 1.8

Number of missing links 13 12 7 16 16 14 12 17

Total number of predicted links with 100 simulations 642 542 354 612 899 819 640 883

The averaged rank before which all missing links fall in

the list of predicted links (y) 64 66 102 93 240 202 80 179

Prediction efficiency (x/y) 1.2016 1.5152 0.5598 1.0753 0.2867 0.2639 0.9375 0.5257

Percentage (%) of correctly predicted links against true

missing links with randomization method (x) 61.5 25.0 71.4 75.0 68.8 66.7 75.0 76.5

Percentage (%) of correctly predicted links against predicted

missing links with randomization method 0.9 0.3 1.2 1.4 1.1 1.0 1.2 1.2

Total number of predicted links with 100 simulations 823 862 423 839 990 974 770 1073

The averaged rank before which all missing links fall in

the list of predicted links (y) 343 106 219 296 409 262 177 505

Prediction efficiency (x/y) 0.1793 0.2358 0.326 0.2534 0.1682 0.2546 0.4237 0.1515

Table 3 Link prediction of some tumor related networks (pathways) of missing links with Rule 2 (per=0.25, =1.5).

Ras p53 Akt HGF JNK PPAR TGF-β TNF

Percentage (%) of correctly predicted links with the algorithm (x) 62.5 92.3 50.0 57.1 75.0 33.3 37.5 62.5

Percentage (%) of correctly predicted links against predicted missing links

with the algorithm 0.7 0.9 0.2 1.5 1.2 1.8 1.5 1.7

Total number of predicted links with 100 simulations 314 388 301 300 404 221 304 291

The averaged rank before which all missing links fall in the list of

predicted links (y) 74 92 6 78 81 41 63 95

Prediction efficiency (x/y) 0.8446 1.0033 8.3333 0.7321 0.9259 0.8122 0.5952 0.6579

Percentage (%) of correctly predicted links against number of missing

links with random network (x) 37.5 53.9 16.7 85.7 62.5 83.3 87.5 75.0

Percentage (%) of correctly predicted links against predicted missing links

with random network 1.6 3.1 2.0 1.3 2.9 0.9 0.9 1.7

Total number of predicted links with 100 simulations 411 823 851 412 838 277 478 359

The averaged rank before which all missing links fall in the list of

predicted links (y) 77 325 23 213 246 55 184 111

Prediction efficiency (x/y) 0.487 0.1658 0.7261 0.4023 0.2541 1.5145 0.4755 0.6757

8

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

4 Discussion

As stated above, random prediction is overall effective for the random networks only. However, in practical

applications, most networks are complex networks. Thus the algorithm is effective in predicting missing links

in most cases. The prediction efficiency of the algorithm increases as the increase of network complexity.

Therefore, the algorithm is more efficient for the networks of higher complexity.

The changes of can reflect various effects of the node degree on connection mechanism. The larger

will lead to find more missing links of the nodes with greater node degree. 0 means a trend to random

prediction. How to fix a suitable value of , is specific to practical problems.

Lü et al. (2015) proposed the structural perturbation method (SPM) to predict missing links and argued that

its prediction ability was stronger than previous methods. However, I affim their method does not hold due to

the following reasons: (1) Mechanically, the structural perturbation method can only be used to analyze

structural stability of dynamic systems. The static structure of a network, expressed by an adjacent matrix, is

the topological structure, which cannot represent the dynamic charicteristics of the network evolution.

Pediction of missing links should be conducted on the basis of mechanism of network evolution (dynamics).

Without loss of generality, network evolution may be approximated with a group of linear differential

equations (Zhang, 2015a). And the structural stability of the network was determined by the eigenvalues but

not eigenvectors of system matrix. Even so, the structural perturbation method for determining the variables

with least impact on structural stability should only be used around the equilibrium states of the system rather

than the states far away the equilibrium. (2) During the evolution of a network, the generated links with most

likehood are not necessarily those links that minimaly perturb the topological structure of the network. On the

premise of not destroying the structural stability of the system and no other limitations, any links will prepare

to be created. A most occurred case is that two nodes with most similarity will firstly connect to each other. (3)

Utilization of missing links in the prediction model to predict missing links, as done by Lü et al. (2015), is

somewhat similar to model fitting but not prediction. In this case, the stronger “prediction” ability (precisely,

fitting ability) is surely expected.

So far all prediction methods based on static topological structure only (represented by adjacency matrix)

seems to be low efficient. Network evolution based (Zhang, 2012a, 2012c, 2015a, 2016a, 2016b), node

similarity based (Zhang, 2015d), and sampling based (correlation based; Zhang, 2007, 2011, 2012b, 2013,

2015b; Zhang and Li, 2015) methods are expected to be the most promising in the future.

Acknowledgment

We are thankful to the support of Discovery and Crucial Node Analysis of Important Biological and Social

Networks (2015.6-2020.6), from Yangling Institute of Modern Agricultural Standardization, High-Quality

Textbook Network Biology Project for Engineering of Teaching Quality and Teaching Reform of

Undergraduate Universities of Guangdong Province (2015.6-2018.6), from Department of Education of

Guangdong Province, and Project on Undergraduate Teaching Reform (2015.7-2017.7), from Sun Yat-sen

University, China.

References

ABCAM. 2012. http://www.abcam.com/index.html?pageconfig=productmap&cl=2282

Amaral LAN. 2008. A truer measure of our ignorance. Proceedings of the National Academy of Sciences of

USA, 105: 6795-6796

9

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

Barabasi AL, Albert R. 1999. Emergence of scaling in random networks. Science, 286(5439): 509

Barzel B, Barabási AL. 2013. Network link prediction by global silencing of indirect correlations. Nature

Biotechnology, 31: 720-725

Bastiaens P, Birtwistle MR, Blüthgen N, et al. 2015. Silence on the relevant literature and errors in

implementation. Nature Biotechnology, 33: 336-339

Clauset A, Moore C, Newman MEJ. 2008. Hierarchical structure and the prediction of missing links in

networks. Nature, 453: 98-101

Guimera R, Sales-Pardo M. 2009. Missing and spurious interactions and the reconstruction of complex

networks. Proceedings of the National Academy of Sciences of USA, 106: 22073-22078

Huang JQ, Zhang WJ. 2012. Analysis on degree distribution of tumor signaling networks. Network Biology,

2(3): 95-109

Li JR, Zhang WJ. 2013. Identification of crucial metabolites/reactions in tumor signaling networks. Network

Biology, 3(4): 121-132

Lü LY, Medo M, Yeung CH, et al. 2012. Recommender systems. Physics Reports, 519: 1-49

Lü LY, Pan LM, Zhou T, et al. 2015. Toward link predictability of complex networks. Proceedings of the

National Academy of Sciences of USA, 112: 2325-2330

Lü LY, Zhou T. 2011. Link prediction in complex networks: A survey. Physica A, 390: 1150-1170

Pathway Central. 2012. SABiosciences. http://www.sabiosciences.com/pathwaycentral.php

Yu HY, Braun P, Yildirim MA, et al. 2008. High-quality binary protein interaction map of the yeast

interactome network. Science, 322: 104-110

Zhang WJ. 2007. Computer inference of network of ecological interactions from sampling data.

Environmental Monitoring and Assessment, 124: 253-261

Zhang WJ. 2011. Constructing ecological interaction networks by correlation analysis: hints from community

sampling. Network Biology, 1(2): 81-98

Zhang WJ. 2012a. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore

Zhang WJ. 2012b. How to construct the statistic network? An association network of herbaceous plants

constructed from field sampling. Network Biology, 2(2): 57-68

Zhang WJ. 2012c. Modeling community succession and assembly: A novel method for network evolution.

Network Biology, 2(2): 69-78

Zhang WJ. 2013. Construction of Statistic Network from Field Sampling. In: Network Biology: Theories,

Methods and Applications (WenJun Zhang, ed). 69-80, Nova Science Publishers, New York, USA

Zhang WJ, 2015a. A generalized network evolution model and self-organization theory on community

assembly. Selforganizology, 2(3): 55-64

Zhang WJ. 2015b. A hierarchical method for finding interactions: Jointly using linear correlation and rank

correlation analysis. Network Biology, 5(4): 137-145

Zhang WJ. 2015c. Calculation and statistic test of partial correlation of general correlation measures.

Selforganizology, 2(4): 65-77

Zhang WJ. 2015d. Prediction of missing connections in the network: A node-similarity based algorithm.

Selforganizology, 2(4): 91-101

Zhang WJ. 2016a. A random network based, node attraction facilitated network evolution method.

Selforganizology, 3(1): 1-9

Zhang WJ. 2016b. Selforganizology: The Science of Self-Organization. World Scientific, Singapore

Zhang WJ, Li X. 2015. Linear correlation analysis in finding interactions: Half of predicted interactions are

10

Network Biology, 2016, 6(1): 1-11

IAEES www.iaees.org

undeterministic and one-third of candidate direct interactions are missed. Selforganizology, 2(3): 39-45

Zhang WJ, Liu GH. 2012. Creating real network with expected degree distribution: A statistical simulation.

Network Biology, 2(3): 110-117

Zhang WJ, Zhan CY. 2011. An algorithm for calculation of degree distribution and detection of network type:

with application in food webs. Network Biology, 1(3-4): 159-170

Zhao J, Miao LL, Yang Y, et al. 2015. Prediction of links and weights in networks by reliable routes. Scientific

Reports, 5: 12261

Zhou T. 2015. Why link prediction? http://blog.sciencenet.cn/blog-3075-912975.html. Accessed on Aug 14,

2015

11

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Article

Centrality measures for immunization of weighted networks

Mohammad Khansari, Amin Kaveh, Zainabolhoda Heshmati, Maryam Ashkpoor Motlaq University of Tehran, Faculty of New Sciences and Technologies, Amir Abad, North Kargar Street 14395-1374, Tehran, Iran

E-mail: [email protected], [email protected],[email protected], [email protected]

Received 25 September 2015; Accepted 27 October 2015; Published online 1 March 2016

Abstract

Effective immunization of individual communities with minimal cost in vaccination has made great

discussion surrounding the realm of complex networks. Meanwhile, proper realization of relationship among

people in society and applying it to social networks brings about substantial improvements in immunization.

Accordingly, weighted graph in which link weights represent the intensity and intimacy of relationships is an

acceptable approach. In this work we employ weighted graphs and a wide variety of weighted centrality

measures to distinguish important individuals in contagion of diseases. Furthermore, we propose new

centrality measures for weighted networks. Our experimental results show that Radiality-Degree centrality is

satisfying for weighted BA networks. Additionally, PageRank-Degree and Radiality-Degree centralities

showmoreacceptable performance in targeted immunization of weighted networks.

Keywords centrality measure; epidemic threshold; largest connected components; targeted immunization;

weighted graphs.

1.

2.

3. Introduction

1 Introduction

Epidemic is 'the occurrence of more cases of disease than expected in a given area or among a specific group

of people over a particular period of time' (Parthasarathy, 2013). Nowadays, nearly 13 million people die

annually by infectious diseases which imposes high costs to society. Between 2009 and 2010, 14,000 people

lost their lives as a result of an influenza epidemic (Jain et al., 2009). Since 1981, AIDS is known as an

epidemic disease and according to estimates by WHO and UNAIDS only in 2013, it was the cause of death of

1.5 million people in the world. As a result, prediction and control of epidemics among populations has

attracted many researchers.

Immunization prevents diseases from spreading and saves many people from suffering and death

(Parthasarathy, 2013). Vaccination is one of the immunization techniques that protects the vaccinated person

as well as prevents the spread of disease, simultaneously (Cornforth et al., 2011). In traditional methods (mass

vaccination) the entire population will be vaccinated, which is not affordable due to high costs (Gallos et al.,

Network Biology ISSN 2220­8879 URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

2007; Vidondo et al., 2012). A more effective method is targeted vaccination in which people are clustered by

some known criteria. In these categories people with more influence are identified and vaccinated (Chen et al.,

2008; Cornforth et al., 2011; Eames et al., 2009; Hartvigsen et al., 2007; Miller and Hyman, 2007; Peng et al.,

2010; Schneider et al., 2012; Shams and Khansari, 2014; Vidondo et al., 2012).The purpose of targeted

vaccination is that people are selected in such a way that unsafe clusters are as small as possible (Schneider et

al., 2012). In other words, the work is done by immunization with intercommunal individuals.

Complex networks has become a suitable tool for modeling, representing and describing the characteristics

of many natural phenomena (Zhgang and Zhan, 2011; Zhang, 2012). Among these networks, social networks

can be outlined in which each node represents an individual while communication between a pair of

individuals is represented as a link. As a result, many studies have been done on disease distributions and

immunization within these networks (Cornforth et al., 2011; Hartvigsen et al., 2007; Shams and Khansari,

2014). In these studies different centralities have been defined and nodes with high centrality have been

vaccinated and the result of the vaccination has been analyzed.

Although representation of social networks with simple graphs leads to suitable models, adopting

weighted networks can be more appropriate to include detailed information. In weighted social networks, the

quality of communication or relationship between a pair of nodes is considered as the weight of the edge

between them. Some works which have studied centrality measures in weighted networks are (Abbasi and

Hossain, 2013; Abbasi et al., 2011; Wang et al., 2008; Zhai et al., 2013).

Shams and Khansari have proposed an evaluation framework for targeted immunization algorithms in

which some centrality measures are utilized (Shams and Khansari, 2014). These centrality measures have been

used to distinguish effective nodes, though other meaningful and powerful ones are not considered.

To our knowledge, a complete study in capability of various centrality measures in detection of effective

nodes in weighted networks has not been done. Thus, we conduct a comprehensive study to determine the

effectiveness of different centrality measures in detecting important nodes within weighted networks. In this

study we propose new centrality measures while comparing them with the current existing measures. All the

evaluations are done on real and artificial networks.

The rest of this paper is organized as follows: section 2 presents current existing centrality measures for

weighted networks, following four new centrality measures which are introduced in section 3. Model

evaluation methods and analysis are introduced in section 4 while experimental results are presented in the

following section 5. Finally, Section 6 concludes this work.

2 Centrality Measures for Weighted Networks

Node centrality is one of the most analyzed concepts which determines the node effect on network flows.

Centrality of a node represents its prominence based on the definition of centrality. In a social network node

centrality exhibits the individuals’ communication pattern and his/her position in the network.

Leavitt defined this concept for the first time in 1951 (Leavitt, 1951)and Freeman defined three important

and applicable centralities: degree, closeness and betweenness in 1978 (Freeman, 1978). In the following

subsections current existing centrality measures are reviewed with emphasis on weighted networks.

2.1 Degree centrality

Degree centrality is a simple local centrality which is defined based on neighborhood concept. In a social

network it exhibits a node popularity and in our examinations it shows the node influence in spreading and

suppression of diseases. In weighted networks, a node degree centrality is the sum of the weights of the edges

13

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

attached to that node which is also nominated as the node strength. This measure represents the whole

involvement of a node in the network (Opsahl et al., 2010):

∑ , (1)

where, is the weighted adjacency matrix and , is the weight of edge which ties node to .

2.2 Distance-based Centralities

Other centralities are distance-based measures and have a close relationship with the concept of “path”in

weighted networks. In contrast to binary networks, different values can be assigned to different links in

weighted networks. Edge weight can represent cost (Dijkstra, 1959) as well as strength of connection

(Newman, 2001). In social networks edge weight represents strength of physical connection or sense of

intimacy between individuals. Hence, distance between two adjacent nodes is the inverse of the weight of their

corresponding link. Thus to calculate the geodesic path we use the method proposed in (Newman, 2001).

Distance-based centralities are as follows:

2.2.1 Closeness centrality

Closeness centrality is a global centrality which represents the independence of a node in the network

(Freeman, 1978). The most central node has the minimum sum of the ‘geodesic’ distances to all other nodes in

the network:

∑ , (2)

where, , is the weighted geodesic path between nodes and .

2.2.2 Betweenness centrality

Betweenness centrality represents the node’s ability to control the data flow in the network (Freeman, 1978).

This measure is the proportion of number of geodesic paths that pass through the given node to number of

geodesic paths between any pair of nodes in the network:

(3)

where, is the number of weighted geodesic paths which pass through node and is the number of

weighted geodesic paths between two nodes.

2.2.3 Radiality centrality

Radiality represents the extent of access into the network privided by the node's neighbors (Valente and

Foreman, 1998). High radiality means that it takes less time for the infectious node to reach others in the

network:

∑ ∆ , (4)

where, ∆ is the diameter of graph , is the set of nodes in graph , , is the weighted geodesic path

between nodes and and | | .

2.2.4 Katz centrality

Katz centrality is the generalization of degree centrality. In other words, a node’s degree centrality is the

number of its neighbors while Katz centrality is the number of nodes which are accessible through a specific

path. However, contribution of distant nodes is reduced:

14

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

∑ , . (5)

where, , is the element of row and column of weighted adjacency matrix. , are positive constants

where, is an attenuation factor,0 1(Katz, 1953). By 0, nodes with zero in-degree (in directed

networks) get positive centrality.

2.2.5 Subgraph centrality

This measure is based on spectral properties and characterizes the contribution of a node in different possible

subgraphs. Subgraph centrality of a node is the sum of closed paths with different lengths which launch from

and terminate in that node. Shorter path lengths have more influence in the calculations (Estrada and

Rodriguez, 2005):

∑!

(6)

where, is the number of closed walks of cost starting and ending at node .

2.2.6 Eccentricity centrality

A node’s eccentricity centrality is the inverse of the longest geodesic path from that node to other nodes.

Hence, a node with the smallest longest geodesic path is the most important node in network. In social

networks this centrality exhibits how distant, at most, is each person from others.

, (7)

where, , is the weighted shortest path between nodes and .

2.2.7 Communication centrality

Communication centrality represents the capability of a node to communicate with other nodes. This measure

depends on node’s degree, communication ability of neighbors and node’s edges weight. Communication

centrality of a node is defined based on multiplying the h-degree of a neighbor in the weight of the edge

connecting the node to that neighbor.

Zhai et al. defined the communication centrality of node as ''the largest integer such that the node

has at least neighbor nodes satisfying the product of each node’s h-degree and the weight of the edge linked

with node is no fewer than " (Zhai et al., 2013). If the h-degree of node’s neighbors are marked as ,

,..., and the node’s edge weight are , , …, then their product sequence is:

, , … ,

If we suppose:

   …

Then,

max  : (8)

2.3 Spectral centralities

Spectral centrality measures are those that their definition and calculation relies on the eigenvalues and the

eigenvectors of adjacency and Laplacian matrices of the graph. The most well-known two are:

2.3.1 Laplacian centrality

15

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Laplacian centrality is based on the Laplacian matrix. This matrix contains practical information about

dynamics and geometry of the network (Pauls and Remondini, 2012). Laplacian matrix is defined as:

where, is the graph’s degree matrix and is the graph’s weighted adjacency matrix. If we nominate

eigenvalues of as , ,..., the Laplacian energy of graph is:

A node’s Laplacian centrality is the difference between Laplacian energy of network with and without that

node (Qi et al., 2012):

, (9)

2.3.2 PageRank centrality

This centrality measure represents a node’s relative importance in the network (Sarma et al., 2011). The node’s

importance depends on the importance of its neighbors. PageRank centrality is based on links analysis:

. 1 (10)

where is known as damping factor that is usually set to 0.85. is the normalized weighted adjacency matrix

and is the preference network.

It is worth noting that Subgraph Centrality can be identified as spectral centrality.

2.4 Hybrid centralities

One of the major problems in pure centralities such as degree, betweenness and closeness centralities is that

they are not a convenient measure to estimating node availability. In other words, the degree centrality of a

node with neighbors with edge weight of 1 is equal to the degree centrality of a node with just 1 neighbor

with edge weight of . However, by removing an edge from each node the former is available and the latter is

not. Hence, hybrid centrality measures are proposed for better understanding of importance and influence of

nodes which are applicable to both weighted and binary networks.

2.4.1 Degree-Degree centrality

This measure highlights the nodes which have more connections to more important nodes. In other words, the

node degree as well as the degree of node’s neighbors are considered in calculation of this centrality measure.

∑ , . (11)

where, is the degree centrality of node and , is the weight of edge between nodes and .

2.4.2 Closeness-Degree centrality

This measure not only represents the node’s influence in control of data flow, but also represents the node’s

performance in communication with other nodes in the network.

∑ , . (12)

where, is the closeness centrality of node and , is the weight of edge between nodes and .

2.4.3 Betweenness-Degree centrality

16

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

This centrality represents popularity of a node as well as its influence in control of data flows in the

network.

∑ , . (13)

where, is the betweenness centrality of node and , is the weight of edge between nodes and .

3 The Proposed Centrality Measures

To detect the most effective nodes in the network and study vaccination of these nodes in the immunization

network, we propose four new hybrid centrality measures regarding the pattern proposed in (Abbasi et al.,

2011):

3.1 PageRank-Degree centrality

As aforementioned, PageRank centrality measure represents the importance of nodes based on their adjacent

nodes. We Introduce PageRank-Degree centrality as a combination of PageRank measure with degree

centrality. The following formula represents this new centrality measure:

∑ , . 1 ∑ (14)

where, , is the weight of the edge between nodes and , is the PageRank centrality of node ,

is damping factor with value of 0.85, is the set of nodes which have edge with node and is number

of outlinks of node .

3.2 Radiality-Degree centrality

Radiality centrality exhibits the degree of a node’s access to other nodes in the network through its neighbors.

Thus, if a node has more neighbors with high Radiality centrality, its Radiality centrality is high as well. We

define Radiality-Degree centrality measure as:

∑ , . ∑∆ ,

(15)

where, , is the weight of edge between nodes and , ∆ is the diameter of graph , is the set of nodes

of graph , is the number of nodes in the graph and , is the geodesic path between nodes and .

3.3 Subgraph-Degree centrality

We define subgraph-degree centrality as:

∑ , . ∑!

(16)

where, , is the weight of edge between nodes and , is the number closed walks of cost starting

and ending on node .

3.4 Katz-Degree centrality

Hybrid Katz-degree centrality is defined as follows:

∑ , . ∑ (17)

where, , is the weight of edge between nodes and . , are positive constants where, is attenuation

factor and 0 1.

17

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

4 Evaluation Method

The purpose of this study is to vaccinate the nodes with highest effect on disease distribution and remove them

from the network. Effective nodes will be selected by centrality measures which were discussed in the

previous section. By vaccinating these nodes the links in the network will be decreased therefore the network

is assorted into smaller elements. The framework of this model is the same as in (Shams and Khansari, 2014).

We use two criteria to estimate immunization algorithms: largest connected component (LCC) and epidemic

threshold.

4.1 Increment of epidemic threshold of network

Epidemic threshold of network means the minimum number of people who have to be infected to reach an

epidemic level. Higher epidemic threshold indicates lower probability to reach epidemic (Chakrabarti et al.,

2008; Kitchovitch and Lio, 2011; Masuda, 2009; Peng et al., 2010; Shams and Khansari, 2014). The epidemic

threshold of a network is the inverse of largest eigenvalue of network adjacency matrix (Chakrabarti et al.,

2008; Kitchovitch and Lio, 2011; Shams and Khansari, 2014). We study the influence of removal of a

specified node in reducing the largest eigenvalue of the network adjacency matrix. The purpose of this study is

to assess the capability of considered centrality measures to detect and immunize the most effective nodes. As

described in (Shams and Khansari, 2014), we calculate where is the largest eigenvalue of the

original network and is the largest eigenvalue of the vaccinated network.

4.2 Decrement of Largest-Connected-Component size

A Connected-Component is a subgraph in which there is at least one path between every two nodes. Thus, the

largest epidemic size of a network is its largest connected component (LCC) size (Chen et al., 2008; Gallos et

al., 2007; Schneider et al., 2012; Shams and Khansari, 2014). Hence, we calculate  where is the

largest connected component of the original network and is the largest connected component of the

immunized network.

5 Experimental Results

In this section we investigate the performance of the predefined and proposed centrality measures in detecting

and immunizing influential nodes in the network. We calculate the centrality measures in three artificial

networks and in one real network and investigate their performance. To generate the artificial networks and to

calculate the different centrality measures we use R.3.1.1 software.

5.1 Datasets

To generate the artificial networks we use three most famous models: Scale-free, Erdős-Rényi model (ER) and

Small-World. Scale-free network is generated based on (Albert and Barabási, 2002) with 500 nodes and

parameter 3 which leads to 1500 edges. ER network is generated with 500 nodes and 1500 edges which

is greater than the threshold of connectedness of a random graph (Erdős and Rényi, 1961). Small-World

network is produced with 500 nodes and 3 initial neighbors in each side which result in 1500 edges and

rewiring probability is 0.1 (Watts and Strogatz, 1998). Edge weight is a random number between 1 and 20.

For each network and for each centrality measure we generate five datasets and calculate the average of

desired parameters. Then we use them in evaluation of immunization performance.

The real network includes Facebook-like (FBL) (Opsahl and Panzarasa, 2009) network which is frequently

used in immunization literature and consists of 1899 nodes and 13838 edges. The weight of the edges are

uniformly distributed between 1 and 20.

18

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

5.2 Diagnose and vaccination

Two methods are proposed to diagnose and vaccinate individuals based on their centrality in networks: the first

is initial method in which centrality measure is calculated in initial network and most central nodes will be

immunized regarding number of available vaccination resources. In the second method, adaptive method, in

each iteration the most central node will be immunized and centrality measure will be recalculated (Schneider

et al., 2012; Shams and Khansari, 2014). We determine and vaccinate individuals based on the initial method.

Our artificial networks consist of 500 nodes and in each iteration we immunize (remove) 10 nodes. The

utilized real network consist of 1890 nodes and in each iteration 40 individuals are vaccinated.

5.3 LCC in network models

Fig. 1 illustrates the efficiency of targeted immunization using LCC size in case of scale-free (BA), Small-

world (SW) and Random (ER) networks, regarding different centrality measures. We plotted the proportion of

LCC size to network size versus number of vaccinated (removed) nodes.Obviously, for BA model which is

presented by ● symbol, by vaccinating 50 as well as 100 individuals Communication centrality outperforms

others. Radiality-Degree, PageRank centralities represent the next best performance with almost same level.

On the other hand, Betweenness-Degree centrality is the worst strategy to immunize the weighted BA network.

Subgraph-Degree and Eccentricity are the next worst metrics. Our experiments show that in all centrality

measures except Betweenness-Degree, the worst expected epidemic size (LCC) decreases significantly in the

first 20 iterations (immunization of 40% of society). By using the Communication centrality, the proportion of

worst expected epidemic size (LCC) to network size is 0.04 when %20 individuals of network are immunized

which has the best result.

Considering different centrality measures to determine and immunize most effective nodes in weighted ER

networks is illustrated by ■ symbol. The worst expected epidemic size reduces considerably by using

Subgraph centrality. Communication and Radiality-Degree centralities have the next best performance. On the

contrary, Eccentricity and Degree-Degree Centralities exhibit the worst results.

Proportion of LCC size to network size versus vaccinated nodes regarding different centrality measures for

Wattz-Strogatz networks are represented by ▲ symbol. The Betweenness centrality outperforms others in

reducing LCC size. By using Betweenness centrality to detect and immunize the most important nodes, the

worst expected epidemic size will be reduced to one fifth of network size by vaccinating 40% of society. Table

1 summarizes our experiments in network models.Closeness-Degree and Radiality-Degree centralities are the

next best measures to immunize small-world networks. Take, for example, in order to diminish the size of

LCC to one fifth of network, we have to immunize 40%, 44% and 48% of the society by using Betweenness,

Closeness-Degree and Radiality-Degree respectively. Katz-Degree, Katz and Eccentricity centralities show the

worst results.

5.4 Epidemic threshold in network models

Effectiveness of different centrality measures to reduce the largest eigenvalue of network adjacency matrix

versus number of vaccinated individuals are illustrated in Fig. 2 in the case of scale-free (BA), Small-world

(SW) and Random (ER) networks.

19

IAEES

Fig. 1

1 Proportion of

f LCC size to in

Netwo

nital network siz

ork Biology, 20

ze versus numb

016, 6(1): 12-27

er of vaccinated

7

d nodes in BA,

ER and WS ne

www.iaees.org

etworks.

g

20

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Table 1 Centrality measures which have the best and worst performance in decreasing LCC size in network models.

Network Model Outperform Underperform

Barabási-Albert Communication Radiality-Degree

PageRank

Betweenness-Degree Subgraph-Degree

Eccentricity

Erdős-Rényi Subgraph

Communication Radiality-Degree

Eccentricity Degree-Degree

Watts-Strogatz Betweeness

Closeness-Degree Radiality-Degree

Katz-Degree Katz

Eccentricity

Proportion of largest eigenvalue of adjacency matrix of immunized network to largest eigenvalue of

original weighted BA network ( ) is illustrated by ● symbol. Radiality-Degree, PageRank-Degree and

Strength centralities represent the best performance respectively. On the other hand, Eccentricity, Subgraph-

Degree and Betweenness-Degree exhibit the worst.

Reducing the largest eigenvalue of immunized network versus immunized nodes is depicted by ■ symbol.

Obviously, Closeness-Degree, Degree-Degree and PageRank-Degree centralities have the most performance

while Eccentricity and Subgraph centralities underperform others.

Table 2 Centrality measures which have the best as well as the worst performance in increasing epidemic threshold in network models.

Network Model Outperform Underperform

Barabási-Albert Radiality-Degree PageRank-Degree

Strength

Eccentricity Subgraph-Degree

Betweenness-Degree

Erdős-Rényi Closeness-Degree

Degree-Degree PageRank-Degree

Eccentricity Subgraph

Watts-Strogatz Strength

Degree-Degree PageRank-Degree

Betweenness Radiality

Eccentricity

The same process has been done for Small-World (Wattz-Strogatz) networks which is illustrated by ▲

symbol. Our experiments show that by vaccination of 47% of society, the proportion of largest eigenvalue of

network to largest eigenvalue of initial network would be 0.3 by using Strength centrality which is the best

result. Degree-Degree and PageRank-Degree centralities exhibit the next best performance. Betweenness,

Radiality and Eccentricity show the worst results respectively. Table 2 summarizes our experiments in

increasing epidemic threshold in different network models.

21

IAEES

Fig. 2vacci

2 Proportion ofinated nodes in

f largest eigenvaBA, ER and W

Netwo

alue of immuniWS networks

ork Biology, 20

ized network to

016, 6(1): 12-27

largest eigenva

7

alue of initial n

etwork versus n

www.iaees.org

number of

g

22

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Fig. 3 Proportion of LCC size to initial network size versus number of vaccinated nodes in real network (FBL).

23

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Fig. 4 Proportion of largest eigenvalue of immunized network to largest eigenvalue of initial network versus number of

vaccinated nodes in real network (FBL).

24

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

5.5 Real network

As aforementioned we use the Facebook-like (FBL) (Opsahl and Panzarasa, 2009) as the real network. The

major drawback of this dataset is that it has a cybernetic essence and does not reflect individuals physical

contacts. However, due to lack of information in contact networks we used this dataset. Our experiments show

that Radiality-Degree centrality is the best measure to detect most influenced nodes. Communication and

PageRank centralities have the next best fulfillment as illustrated in Fig. 3.

To reduce the worst epidemic size to one fifth of the network size, we have to vaccinate %23, %23.7 and

24.3% of most important nodes by using Radiality-Degree, Communication and PageRank centralities,

respectively.

Fig. 4 illustrates the impact of targeted vaccination on reducing the largest eigenvalue of epidemic

threshold of proposed real network regarding different centrality measures. Our experiments show that

Closeness-Degree centrality has the best performance in increasing the network epidemic threshold. Degree-

Degree and Katz-Degree centralities show the next best results. By contrast, Eccentricity and Betweenness-

Degree centralities exhibit the worst results respectively.

6 Conclusions

Due to accuracy of weighted networks in representing physical contacts in society, we employed them in this

work. Hence, we took advantage of 13 predefined centrality measures for weighted networks to determine

prominent nodes as central nodes. Furthermore, we proposed 4 new hybrid centrality measures which weight

of links are considered in their computations. These 17 centrality measures were considered in targeted

immunization realm. We made a comparison between these centrality measures for eminent network models:

BA, ER and WS, as well as a real network (FBL) based on two metrics: epidemic threshold and largest

connected component of network.In case of increasing size of largest connected component, Radiality-Degree

centrality represented satisfying results as well as PageRank-Degree used to be applicable in increasing

epidemic threshold in all three models: BA, ER and WS. On the other hand, Subgraph-Degree centrality

exhibited poor performance among 4 proposed centrality measures. Moreover, Katz-Degree demonstrated

unacceptable outcomes in almost all networks although exhibited valuable results in increasing epidemic

threshold in real network (FBL).

Concisely, among pure centralities, Communication and Strength were applicable in decreasing LCC and

increasing epidemic threshold respectively. Moreover, Eccentricity presented entirely unacceptable results.

Among predefined hybrid centrality measures, (Degree-Degree, Betweenness-Degree and Closeness-Degree)

we concluded that Betweenness-Degree centrality is impractical in decreasing LCC. Results to the other two

measures were not so clear. Last but not least, for weighted BA network, Radiality-Degree and Betweenness-

Degree centralities represented best and worst performance respectively. No such unique conclusion was

inferred for the other two network models, i.e. Erdős-Rényi and Wattz-Strogatz.

Acknowledgment

We are thankfull to partiallysupportfrom Iran Telecommunication Research Center (ITRC).

25

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

References

Abbasi A, Hossain L. 2013. Hybrid centrality measures for binary and weighted networks. In Complex

networks. 1-7, Springer Berlin Heidelberg, Germany

Abbasi A, Altmann J, Hossain L. 2011. Identifying the effects of co-authorship networks on the performance

of scholars: A correlation and regression analysis of performance measures and social network analysis

measures. Journal of Informetrics, 5(4): 594-607

Albert R, Barabási AL. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):

47

Chakrabarti D, Wang Y, Wang C, Leskovec J, Faloutsos C. 2008. Epidemic thresholds in real networks. ACM

Transactions on Information and System Security, 10: 1-26

Chen Y, Paul G, Havlin S, et al. 2008. Finding a Better Immunization Strategy. Physical Review Letters, 101:

2-5

Cornforth DM, Reluga TC, Shim E, et al. 2011. Erratic flu vaccination emerges from short-sighted behavior in

contact networks. PLoS Computational Biology, 7: e1001062

Dijkstra EW. 1959. A note on two problems in connexion with graphs. Numerischemathematik, 1(1): 269-271

Eames KTD, Read JM, Edmunds WJ. 2009. Epidemic prediction and control in weighted networks. Epidemics,

1: 70-76

Erdős P, Rényi A. 1961. On the strength of connectedness of a random graph. Acta Mathematica Hungarica,

12(1-2): 261-267

Estrada E. Rodriguez-Velazquez JA. 2005. Subgraph centrality in complex networks. Physical Review E,

71(5): 056103

Freeman LC. 1978. Centrality in social networks conceptual clarification. Social Networks, 1: 215-239

Gallos L, Liljeros F, Argyrakis P, et al. 2007. Improving immunization strategies. Physical Review E, 75: 1-4

Hartvigsen G, Dresch JM, Zielinski AL, Macula AJ, et al. 2007. Network structure, and vaccination strategy

and effort interact to affect the dynamics of influenza epidemics. Journal of Theoretical Biology, 246: 205-

213

Jain S, Kamimoto L, Bramley AM, Schmitz AM, Benoit SR, Louie J, et al. 2009. Hospitalized patients with

2009 H1N1 influenza in the United States, April–June 2009. New England Journal of Medicine, 361(20):

1935-1944

Katz L. 1953. A new status index derived from sociometric analysis. Psychometrika, 18(1): 39-43

Kitchovitch S, Lio P. 2011. Community structure in social networks: applications for epidemiological

modelling. PloS one, 6: e22220.

Leavitt HJ. 1951. Some effects of certain communication patterns on group performance. The Journal of

Abnormal and Social Psychology, 46(1): 127-134

Masuda N. 2009. Immunization of networks with community structure. New Journal of Physics, 11: 123018.

Miller JC, Hyman JM. 2007. Effective vaccination strategies for realistic social networks. Physica A:

Statistical Mechanics and Its Applications, 386: 780-785

Newman ME. 2001. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality.

Physical Review E, 64(1): 016132

Opsahl T, Panzarasa P. 2009. Clustering in weighted networks. Social Networks, 31: 155-163

Opsahl T, Agneessens F, Skvoretz J. 2010. Node centrality in weighted networks: Generalizing degree and

shortest paths. Social Networks, 32(3): 245-251

Parthasarathy A. 2013. Textbook of Pediatric Infectious Diseases. JP Medical Ltd, London, UK

26

Network Biology, 2016, 6(1): 12-27

IAEES www.iaees.org

Pauls SD, Remondini D. 2012. Measures of centrality based on the spectrum of the Laplacian. Physical

Review E, 85(6): 066127

Peng C, Jin X, Shi M. 2010. Epidemic threshold and immunization on generalized networks. Physica A:

Statistical Mechanics and Its Applications, 389: 549-560

Qi X, Fuller E, Wu Q, Wu Y, Zhang CQ. 2012. Laplacian centrality: A new centrality measure for weighted

networks. Information Sciences, 194: 240-253

Sarma AD, Gollapudi S, Panigrahy R. 2011. Estimating pagerank on graph streams. Journal of the ACM

(JACM), 58(3): 13

Schneider CM, Mihaljev T, Herrmann HJ. 2012. Inverse targeting —An effective immunization strategy.

Europhysics Letters, 98: 46002

Shams B, Khansari M. 2014. Using network properties to evaluate targeted immunization algorithms. Network

Biology, 74: 21

Valente TW, Foreman RK. 1998. Integration and radiality: measuring the extent of an individual's

connectedness and reachability in a network. Social networks, 20(1): 89-105

Vidondo B, Schwehm M, Bühlmann A, et al. 2012. Finding and removing highly connected individuals using

suboptimal vaccines. BMC Infectious Diseases, 12: 51

Wang H, Martin Hernandez J, Van Mieghem P. 2008. Betweenness centrality in a weighted network. Physical

Review E, 77: 046105

Watts D J, Strogatz S H. 1998. Collective dynamics of ‘small-world’ networks.nature, 393(6684): 440-442

Zhai L, Yan X, Zhang G. 2013. A centrality measure for communication ability in weighted network. Physica

A: Statistical Mechanics and its Applications, 392(23): 6107-6117

Zhang WJ. 2012. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore

Zhang WJ, Zhan CY. 2011. An algorithm for calculation of degree distribution and detection of network type:

with application in food webs. Network Biology, 1(3-4): 159-170

27

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

Article

Investigation of common disease regulatory network for metabolic

disorders: A bioinformatics approach Tasnuba Jesmin1, Sajjad Waheed1, Abdullah-Al-Emran2

1Department of Information & Communication Technology, Mawlana Bhashani Science and Technology University, Santosh,

Tangail-1902, Bangladesh 2Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh,

Tangail-1902, Bangladesh

Email: [email protected]

 

Received 27 June 2015; Accepted 5 August 2015; Published online 1 March 2016

 

Abstract

Metabolic disorder causes the failure of metabolism process is growing concern worldwide. This research

predicts a common metabolic pathway that is shared by Obesity, Type-2 Diabetes, Hypertension and

Cardiovascular diseases due to metabolic disorder. A protein-protein interaction network is created to show the

protein co-expression, co-regulations and interactions among gene and diseases. Genes whose are associated

with metabolic diseases have been accumulated from different gene databases with verification and ‘mined’

them to establish gene interaction network models for expressing the molecular linkages among genes and

diseases which affect disease progression. The number of associated genes identified for Type 2 Diabetes (T2D)

is 250, Hypertension (HT) is 156, Obesity (OB) is 185 and cardiovascular disease (CVD) is 178.Among the

sorted candidate gene 10 common genes are identified whose are directly or indirectly associated with four

diseases by doing linkage filtering. By analysing the gene network model and PPI network a common

metabolic pathway among metabolic diseases has been investigated.

Key words data mining; metabolic disorders; metabolic diseases; PPI network; gene regulatory network.

1 Introduction

According to the WHO, overweight and obesity are among the five leading causes of deaths. Today 65% of the

world‘s population live in countries where overweight and obesity are responsible for more deaths than

underweight (World Health Organization, WHO) (World Diabetes, 2008). Overweight causes Obesity leads to

death due to Type 2 diabetes, Hypertension, Cardiovascular diseases and other metabolic disorder diseases.

Barness et al. (2007) reported that preventable cause of death worldwide is influenced by obesity where adults

and children are becoming more affected, and very few public health problems of the 21st century are as

Network Biology     ISSN 2220­8879   URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

viewed awful as obesity. On average, life expectancy shorten by six to seven years as a consequence of obesity

(Haslam and James, 2005; Peeters et al., 2003), life expectancy decreases by two to four years as a result of

BMI of 30-35 kg/m2, while severe obesity (BMI > 40 kg/m2) reduces life expectancy by ten years what is

proposed by another article (Whitlock et al., 2009). On the other hand there is a vital correlation between

obesity and type-2 diabetes (K. Ahmed et al., 2012).

The increasing level of Type 2 diabetes may remain in undetected for many years. In 2014, 9% of adults

18 years and older had diabetes. In 2012 diabetes was the direct cause of 1.5 million deaths. More than 80% of

diabetes deaths occur in low- and middle-income countries as like Bangladesh. Both lifestyle and genetic

factors have a role to initiate type 2-diabetes (Ripsinet al., 2009; Riserus et al., 2009). So type-2 diabetes and

metabolic disorder have a correlation. The authors Chobanian AV et al. proposed in their article at 2003 that

hypertension is a chronic metabolic disorder. Kearney PM et al. proposed at 2005 that both developed (333

million) and undeveloped (639 million) countries hypertension are general. A prospective cohort study (Gress

et al., 2000) proposed that T2D mellitus was almost 2.5 times higher to develop in subjects with hypertension

compare with normal blood pressure. The article (Cheung, 2010) showed that there is a strong correlation

between obesity and hypertension. They are proportionally increased or decreased.

In a multinational study, 50% of people with diabetes die of cardiovascular disease (primarily heart

disease and stroke) (Morrish et al., 2001). Kvan et al. (2007) and Norhammar et al. (2004) also proposed that

metabolic syndrome is a major risk factor to increase cardiovascular disease. Obesity and type 2 diabetes are

responsible to initiate metabolic syndrome (Isomaa et al., 2001). Now cardiovascular diseases are usually

connected with obesity and diabetes mellitus (Highlander and Shaw, 2010). Lastly obesity, type-2 diabetics

and CVD are correlated. Disease genes cause or contribute genetically to the development of the most complex

diseases. Gene alteration or their mechanical changes are mainly amenable to create disease. But for a

particular disease gene or a set of genes play a major role. Those are called disease genes. Proteins are

responsible for maintaining all cellular functions and their production is governed by the genetic code. A

disease may be the result of gene abnormality that causes any kind of changes in protein function. Therefore,

to establish network among gene it is very important to analyze the Characteristic of proteins and

understanding their function.

Klingstrom and Plewczynski (2011) described about the different types of Bio Informatics tool that are

helpful to show interaction, predict pathway and represent the interaction prediction among diseases, genes as

well as proteins. Kanehisa et al. (2008) provided detail information about KEGG database use and analysis.

Carretero and Oparil (2000) and Ashish Aneja et al. (2004) proposed that hypertension and obesity are strongly

interconnected, even that obesity is the main factor of causing hypertension. This paper has vast description

about hypertension and it stages. Type 2 diabetes and hypertension have common metabolic pathway during

the genetic level, known from article (Bernard et al., 2012). Microvascular and macrovascular complications of

diabetes are major causes of coronary heart diseases (CHD).The UNIHI tool what is used to predict PPI

network and Common metabolic pathway is referred by Kalathur et al. (2014). At July 2014 a research paper

has been published in Public health and proves that Adult obesity causes type-2 diabetes. Julien Dusonchet et

al. (2014) have focused that type 2 diabetes leads the cause of cardiovascular disease.

Over the last few years, there has been a growing concern in the study of biological interaction networks

at the genetic levels (Zeitoun et al., 2012; Rahman et al., 2013; Zhang, 2012; Zhang and Li, 2015). Identifying

basic structural relationships among the diseases, gene and protein is the main goal in the field of gene

regulatory interaction network. A community based approach gene regulatory network is established for

complete or incomplete topology using genes (Meyer et al., 2014). The authors proposed that Gene regulatory

networks (GRNs) regulate critical events during development of Cells (Wang et al., 2014). The paper

29

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

(Simo˜es-Costa et al., 2014) identifies new links in the gene regulatory network which is responsible for

development of this critical cell population. They also provide unprecedented characterization of themigratory

CNC transcriptome. Hayes and Dinkova-Kostova have designed an Nrf2 regulatory network. An interface

between intermediary metabolism and redox can be gained through Nrf2 regulatory network. JulienDusonchet

et al. have designed a Parkinson’s disease gene regulatory network and that is capable to find out LRRK2 gene

regulatory network model.

A paper (Ville-Petteri Ma¨kinen, 2014) proposed a Gene Networks for Coronary Artery Disease with

Molecular Pathways for Integrative Genomics. The paper has investigated the role of gene duplication for

creating gene network evolution (Teichmann and Babu, 2014). Apostolos Zaravinos et al. showed that there is

an associated network among deregulated genes for cohort of ccRCC tissues. They also suggest that these

genes are candidate predictive markers of the disease (Apostolos Zaravinos et al., 2014). Medaa et al. proposed

a genetic association’s mode network for psychotic bipolar disorder and schizophrenia. A miRNA-TF-gene

regulatory pathway in obesity is designed by Zhang et al. (2015).

Complex interactions among the cell’s numerous constituents such as protein, DNA, RNA and other small

molecules are responsible to create Biological functions. Thus, for it is important to assess interactions among

gene-gene, gene-protein and metabolic levels. The presented research work has applied a system in

bioinformatics approach for developing a gene interaction network model by taking high throughput genomic

and PPI data for those diseases.

According to the above discussion on obesity, type-2 diabetics, hypertension, cardiovascular disease, it is

visualized that they may be directly or indirectly interconnected with metabolic disorder. But is any common

pathway shared by obesity, type-2 diabetics, hypertension, and cardiovascular disease? The investigation

procedure and result is discussed in section 2 and 3 respectively.

2 Materials and Methods

There are some steps to accomplish a gene network topology which aids to establish a common pathway. Step

by step details description is shown below subsections through 2.1 to 2.2.7 respectively.

2.1 Data source

For the purpose of this research genes associated with Type 2 Diabetes, Hypertension, obesity and

Cardiovascular diseases are collected from PubMed. PubMed is the reliable and authentic storage for different

kind of genetic data. Those sources of PubMed are maintained by the NCBI (National Center of Biotechnology

Information) .The NCBI is freely accessible and downloadable gene database. GENE Bank data warehouse

and OMIM database are also used to collect the gene list for T2D, CVD, OB and HT.

2.2 Methods

2.2.1 Gene integration and processed

In this study, the disease genes are defined as the reported genes provided by the NCBI. The NCBI provides a

quality-controlled and literature-derived collection. The candidate genes are enlisted according to the specific

disease and merged the collected genes for each disease.

The collected genes are also verified using KEGG database. KEGG database resource consists of the

sixteen main databases. They are broadly categorized into systems information, genomic information and

chemical information and further subcategorized by color coding of web pages. KEGG pathway contains

around three lakhs entries for pathway maps built from around five hundreds manually drawn diagrams. After

the collection and integration the merged gene lists for each disease are processed individually to avoid

duplication and unnecessary genomic data.

30

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

2.2.2 Gene mining

Data mining is mainly used for making data appropriate for analysis and application. The listed candidate

genes related to the type-2 diabetes, obesity and hypertension and cardiovascular disease has been mined

according to metabolic disorder by using data mining technique and stored in Unigene data warehouse.

Unigene is primarily a database in NCBI. But it refers to cluster of genes that perform a particular function.

Due to the application of mining technique on gene this step is named as Gene Mining.

2.2.3 Gene identification according to disease

To identify the interrelated genes among metabolic diseases there is used EXPASY database which help to find

out genes those are not only related to diseases also have action for the cause of crashing metabolism process

of specific diseases.

2.2.4 Gene sorting

Sorting is the most crucial part of this research because any kind of tinny mistakes can remove the important

gene that may give wrong result. Sorting algorithm of Taxonomy database is used here to sort the identified

genes whose are internally correlated among obesity, T2D, hypertension and cardiovascular disease. The

common gene among theses 4 diseases has been determined those are directly or indirectly affect the each

disease.

2.2.5 Gene filtering

This is the critical step of this research and here is used UniHi tool. UniHI (Unified Human Interactome) is an

Omic tool Linkage Network filtering Technique is used to identify the common gene within the target diseases

T2D, OB, HT and CVD. UniHi tool is applied to find out those genes that have minimum binary and also

complex interaction among themselves. The investigated common genes are used to establish a PPI network.

2.2.6 PPI network creation

Protein-Protein interaction network plays an important role in bioinformatics research.PPI network also helps

to understand about the molecular mechanism of human diseases signaling pathways and to identify a new

module of disease processes. UniHi is now a very popular reliable bioinformatics tools to represent PPI maps

among genes. UniHI 7 currently includes almost 350 000 molecular interactions between genes, proteins and

drugs, as well as numerous other types of data such as gene expression and functional annotation. Finally PPI

network is created for common genes using UniHi tool.

2.2.7 Common regulatory pathway

Eight and final step is the construction of the common gene regulatory pathway. The common genes among

these diseases are verified with KEGG database in different biological pathways. The selected genes are

cross-validated and clustered using the KEGG mining tools (Kanehisa et al., 2008). From the pathway

information on the selected genes, there have predict common regulatory pathway using UniHI.

3 Results

A disease is rarely a consequence of an abnormality in a single gene directly, there are the effect of more than

one gene directly, indirectly even through proteomic level. The Different Bioinformatics tools establish a new

way to analysis the disease process by using gene and protein structure and interaction among themselves. The

trustable gene database permits free access to collect genetic data related to specific disease and the tools

advances the old version treatment by representing the deep level of genetic abnormality.

3.1 Gene collection, integration, mining and sorting

Collected responsible genes for target diseases (T2D, HT, OB and CVD) from PubMed, OMIM and Gene bank

database are merged and processed. The result shows responsible genes for T2D is 2794, HT is 1520, OB is

2531and CVD is 4713.Unigene Database is used here to mine the processed list of responsible gene. After

31

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

mining the responsible genes are reduced for each disease. The resultant responsible genes are now for T2D,

HT, OB and CVD are 250,156, 185 and 178 respectively. Table 1 shows the full description of resultant

responsible genes history of each sector. Among them the candidate genes are selected those are connected

with any one of the above mentioned diseases. The candidate genes are justified experimentally using KEGG

pathways. After analyzing the result the genes are picked out for T2D is 125, HT is 121, OB is 108 and CVD

is110.After passing the sorting stage the number of genes identified for Type 2 Diabetes, Hypertension,

Obesity, Cardiovascular diseases are 62 genes.

Table 1 Gene Collection chart for metabolic disorder and target disease according to Homo sapiens

3.2 Gene linkage filtering among T2D, OB, HT and CVD

Cross linkage is used here to investigate the molecular cross-talk within four interrelated diseases (like T2D,

HT, OB and CVD) mechanisms. The results of every cross linkage are shown in Table-2. The individual

disease networks were looked thoroughly at their ‘hubs’. Analysis of the gene patterns and their relatedness

within different diseases is down to collect genes of four diseases. Through the investigation of connecting

procedure and cross talk, a gene list is generated. The gene list contains all types of connections among four

diseases. During this process the common genes for all four diseases is found 10. These genes are NR3C1,

APOA1, APOB, CCL2, IL6, STAT3, NFKB1, LPL, PPARGC1A and TNF.

Table 2 Cross Linkage gene chart for metabolic disorder and target disease according to Homo sapiens.

Cross Linkage Between No of Gene Cross Linkage Among No. of Gene Common

Gene No.

CVD and OBS 110 CVD, OBS and HT 108

62

CVD and T2D 125 CVD, OBS and T2D 92

CVD and HT 108 OBS, T2D and HT 88

OBS and T2D 136 CVD, T2D and HT 90

OBS and HT 98 CVD, OBS, T2D and HT 70

T2D and HT 121

Name of Disease

Primary

Number of

Gene

Collection

Gene No.

for

Metabolic

Disorder

Gene No. for human and

Metabolic Disorder

Cardiovascular Disease 4713 208 178

Obesity 2531 222 185

Type2 Diabetes 2794 298 250

Hypertension 1520 191 156

32

  IAEES

3.3 PPI

After ana

called Un

The PPI

genes an

gene or h

3.4 Com

To ident

confirma

least two

I network am

alyzing the b

niHI. UniHI

network aro

nd hub protein

hub protein. G

mmon regulat

tify the comm

ation. Commo

o hub genes.

Fig

mong commo

background st

can represen

ound commo

n. Some gene

Gene regulato

tory pathway

mon regulato

on regulatory

g. 1 PPI network

Network

on genes

tudies one Om

nt the protein

n genes is g

es are connect

ory model of

y

ory pathway

y pathway for

k among comm

k Biology, 2016

mic bioinform

-protein inter

given in Fig.

ted with each

each common

among comm

r each hub gen

mon genes repres

6, 6(1): 28-36

matics tool is

raction netwo

1.PPI netwo

h other direct

n gene is also

mon candida

nes are determ

sents the comm

s selected to

ork pattern am

ork represent

ly and some

o given in Fig

ate gene ther

mined, any p

mon regulatory p

w

go ahead of

mong the can

ts the relation

are connected

g. 1.

re is used thr

athway that i

pathway.

www.iaees.org

this research

ndidate gene.

nship among

d via another

ree steps for

is common at

h

.

g

r

r

t

33

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

3.4.1 Step-1: relationship among common genes

Common regulatory pathway around the common genes is outlined by UniHI in Figure-1.The RELA connects

the Candidate genes STAT3,NFKB1,CCL2, NR3C1 and IL6.NFKB1 also directly connected with SP1 and

CEPBP.NFKB1 is directly connected with IL6,CCl2 and NR3C1.NFKB1 and TNF are connected indirectly by

TAB2, IKBKB and IKBKG.PPARGC1A has direct connection with APOB through HNF4A.Again APOB has

indirect connection with both APOA1 and LPL.LCP1 gene keep the indirect communication using UBC with

NR3C1,NFKB1,STAT3,APOB and PPARGC1A.

3.4.2 Step-2: result validation of step-1 using GeneMANIA

By using GeneMANIA tools there is identified the pathway among those candidate gene which gives almost

same result as like step 1.A network is also generated with common metabolic pathway.NFKB1 connect five

genes CCL2,NR3C1,TNF,STAT3 and IL6.STAT3 has connection with LPL and LPL has connection with

APOB and both APOB and APOA1 has interrelationship. So the common pathway among NR3C1,APOA1,

APOB, CCL2, IL6, STAT3, NFKB1, LPL, PPARGC1A, TNF genes maintained in a cycle as like

APOA1-APOB-LPL-PPARGC1A-STAT3-TNF-NFKB1-CCL2-NR3C1-IL6.

3.4.3 Step-3: result validation of step-1 using UNIHI

The result of UNIHI tool is more specified. Among NR3C1, APOA1, APOB, CCL2, IL6, STAT3, NFKB1,

LPL, PPARGC1A, TNF genes the common metabolic pathway cycle cover NFKB1-PPARGC1A genes. Each

of these hub genes is connected to other candidate gene and keeps effect on the expression of responsible gene.

The result of every step in subsection 3.4.2 provides the same metabolic pathway. That’s validates the

proposed common metabolic pathway among T2D, OB, HT and CVD disease.

4 Discussion

Studies on the functional cross-links between gene associated diseases and specific disease are still in their

early stages and not well known much. Understanding the genetic mechanisms of diseases it is important to

know and analyze the Connections between genes and diseases. Both Candidate genes associated with Type-2

Diabetes, Hypertension, Cardiovascular disease, obesity and the metabolic diseases are topologically important

to construct a metabolic diseases network. Via mapping inter-genes to PPI, show the association among the

selected diseases through the genetic level there is constructed a cross talking sub pathway. A cross-talking sub

pathways network analysis gives a great performance capturing higher-level relationship among gene and

disease. The network-based analysis provides a rather than promising insight of a common metabolic path

between gene and disease.

Type-2 Diabetes, Obesity, Hypertension and Cardiovascular diseases cause due to the abnormality of

metabolism. To identify the interrelationship among these metabolic diseases have selected the genes related to

the diseases those have perfect biological relation to the specific disease. Cross linkage among metabolic

disease shows the relationship among them through gene level. Selection of a good set of gene can represent

an accurate Protein-Protein Interaction network among diseases and diseases genes. By mapping and analyzing

the PPI network common metabolic path has been investigated. By this Common metabolic pathway there is

established a metabolic diseases network which can regulate the expression of gene. This research is mainly

helpful to understand the metabolism network among genes and to target drug design

Abbreviations

T2D=Type-2 Diabetes; OB=Obesity; HT= Hypertension; CVD=Cardiovascular Disease

34

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

Acknowledgment

Financial Support has been given by ministry of Information and communication Technology, Bangladesh. I,

Tasnuba Jesmin, thank to all ICT ministry personnel, staffs for their supporting and special thanks to my

supervisors for their valuable guidance and insight, encouragement.

References

Ahmed K, Jesmin T, Fatima U, Moniruzzaman M, Emran AA., Rahman MZ. 2012. Intelligent and effective

diabetes risk prediction system using data mining. Oriental Journal of Computer Science and Technology,

5 (2): 215-221

Apostolos Zaravinos, et al. 2014. Altered metabolic pathways in clear cell renal cell carcinoma: A

meta-analysis and validation study focused on the deregulated genes and their associated networks.

Oncoscience, 1(2): 117-131

Ashish Aneja, et al. 2004.Hypertension and obesity. The Endocrine Society, 169-205

Barness LA, Opitz JM, Gilbert-Barness E, 2007. Obesity: genetic, molecular, and environmental aspects.

American Journal of Medical Genetics. 143: 3016–34.

Bernard MY, Cheung, Li C. 2012. Diabetes and hypertension: is there a common metabolic pathway? Current

Atherosclerosis Reports, 14: 160-166

Carretero OA, Oparil S. 2000. Essential hypertension: Part I: Definition and etiology. Circulation, 101:

329-335

Cheung BM. 2010. This is a brief review of the overlap between hypertension and type-2 diabetes that

proposes there is a spectrum ranging from hypertension without dysglycemia to type-2 diabetes without

elevated blood pressure. The hypertension-diabetes continuum. Journal of Cardiovascular Pharmacology,

55: 333-339

Chobanian AV, Bakris GL, Black HR, et al. 2003. Seventh report of the joint national committee on prevention,

detection, evaluation, and treatment of high blood pressure. Hypertension, 42: 1206-1252

Gress TW, Nieto FJ, Shahar E, et al. 2000. Hypertension and antihypertensive therapy as risk factors for type 2

diabetes mellitus. Atherosclerosis Risk in Communities Study. The New England Journal of Medicine,

342: 905-912

Haslam DW, James WP. 2005. Obesity. Lancet, 366: 1197-1209

Hayes JD, Dinkova-Kostova AT. 2014. The Nrf2 regulatory network provides an interface between redox and

intermediary metabolism. Trends in Biochemical Sciences, 39(4): 199-218

Highlander P, Shaw GP. 2010. Current pharmacotherapeutic concepts for the treatment of cardiovascular

disease in diabetics. Therapeutic Advances in Cardiovascular Disease, 4: 43-54

Isomaa B, Almgren P, Tuomi T, et al. 2001.Cardiovascular morbidity and mortality associated with the

metabolic syndrome. Diabetes Care, 24: 683-689

Julien Dusonchet, et al. 2014. A Parkinson’s disease gene regulatory network identifies the signaling protein

GS2 as a modulator of LRRK2 activity and neuronal toxicity A Parkinson’s disease gene regulatory

network identifies the signaling protein RGS2 as a modulator of LRRK2 activity and neuronal toxicity.

Human Molecular Genetics, 23(18): 4887-4905

Kalathur RK, Pinto JP, Hernández-Prieto MA, et al. 2014. UniHI 7: an enhanced database for retrieval and

interactive analysis of human molecular interaction networks. Nucleic Acids Research, 42: D408-D414

Kanehisa M, Araki M., Goto S, et al. 2008. KEGG for linking genomes to life and the environment. Nucleic

Acids Research, 36: D480-D484

Kearney PM, Whelton M, Reynolds K, et al. 2005. Global burden of hypertension: analysis of worldwide data.

35

Network Biology, 2016, 6(1): 28-36 

  IAEES www.iaees.org

Lancet, 365: 217-223

Klingstro TS, Plewczynski D. 2011. Protein-protein interaction and pathway databases, a graphical review.

Briefings in Bioinformatics, 12(6): 702-713

Kvan E, Pettersen KI, Sandvik L, et al. 2007. High mortality in diabetic patient with acute myocardial

infarction: cardiovascular co-morbidities contribute most to the high risk. International Journal of

Cardiology, 121: 184-188

Kanehisa M, Araki M, Goto S, et al. 2008. KEGG for linking genomes to life and the environment. Nucleic

Acids Research, 36: D480-D484

Meda SA, et al. 2014. Multivariate analysis reveals genetic associations of the resting default mode network in

psychotic bipolar disorder and schizophrenia. PNAS, E2066-E2075

Meyer P, et al. 2014. Network topology and parameter estimation: from experimental design methods to gene

regulatory network kinetics using a community based approach. BMC Systems Biology, 8: 1-13

Morrish NJ, Wang SL, Stevens LK, Fuller JH, Keen H. 2001. Mortality and causes of death in the WHO

multinational study of vascular disease in diabetes. Diabetologia, 44(2): S14-S21

Norhammar A, Malmberg K, Diderhol E, et al. 2004. Diabetes mellitus: the major risk factor in unstable

coronary artery disease even after consideration of the extent of coronary artery disease and benefits of

revascularization. Journal of the American College of Cardiology, 43: 585-591

Rahman KMT, Islam MdF, Banik RS, et al. 2013. Changes in protein interaction networks between normal and

cancer conditions: Total chaos or ordered disorder? Network Biology, 3(1): 15-28

Peeters A, Barendregt JJ, Willekens F, et al. 2003. Obesity in adulthood and its consequences for life

expectancy: A life-table analysis. Annals of Internal Medicine, 138: 24-32

Ripsin CM, Kang H, Urban RJ. 2009. Management of blood glucose in type 2 diabetes mellitus. American

Family Physician, 79: 29-36

Risérus U, Willett WC, Hu FB. 2009. Dietary fats and prevention of type 2 diabetes. Progress in Lipid

Research. 48: 44-51

Simo˜es-Costa M, Tan-Cabugao J, Antoshechkin I, et al. 2014. Transcriptome analysis reveals novel players in

the cranial neural crest gene regulatory network. Genome Research, 24: 281-290

Teichmann SA, Babu MM. 2014. Gene regulatory network growth by duplication. Nature Genetics, 36(5):

492-496

Ville-Petteri M, Civelek M, Meng QY, et al. 2014. Integrative Genomics Reveals Novel Molecular Pathways

and Gene Networks for Coronary Artery Disease. PLOS Genetics, 10(7): 1-14

Wang S, Sengel C, Emerson MM, et al. 2014. A gene regulatory network controls the binary fate decision of

rod and bipolar cells in the vertebrate retina. Developmental Cell, 30: 513-527

Whitlock G, Lewington S, Sherliker P, et al. 2009. Body-mass index and cause-specific mortality in 900 000

adults: collaborative analyses of 57 prospective studies. Lancet, 373: 1083-1096

World Health Organization. 2008. World Diabetes. Fact sheet N312. WHO

Zeitoun AH, Ibrahim SS, Bagowski, CP. 2012. Identifying the common interaction networks of amoeboid

motility and cancer cell metastasis. Network Biology, 2(2): 45-56

Zhang WJ. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore, 2012

Zhang WJ, Li X. 2015. Linear correlation analysis in finding interactions: Half of predicted interactions are

undeterministic and one-third of candidate direct interactions are missed. Selforganizology, 2(3): 39-45

Zhang XM, Guo L, Chi MH, et al. 2015. Identification of active miRNA and transcription factor regulatory

pathways in human obesity-related inflammation. BMC Bioinformatics, 16(76): 1-7

36

Network Biology, 2016, 6(1): 37-39

IAEES www.iaees.org

Short Communication

Network chemistry, network toxicology, network informatics, and

network behavioristics: A scientific outline

WenJun Zhang

School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; International Academy of Ecology and

Environmental Sciences, Hong Kong

E-mail: [email protected], [email protected]

Received 18 July 2015; Accepted 26 August 2015; Published online 1 March 2016

Abstract

In present study, I proposed some new sciences: network chemistry, network toxicology, network informatics,

and network behavioristics. The aims, scope and scientific foundation of these sciences are outlined.

Keywords network chemistry; network toxicology; network informatics; network behavioristics; scientific

foundation; aims and scope; new sciences.

1 Introduction

Like network biology (Barabasi and Oltvai, 2004; Zhang, 2011, 2012; Find details on network biology at

http://www.iaees.org/publications/journals/nb/nb.asp), and network pharmacology (Hopkins, 2007, 2008),

network science has been successfully used in some areas of sciences. Using network theory and methodology

to improve traditional sciences is proved to be an effective approach. In present study, I try to propose some

new sciences, i.e., network chemistry, network toxicology, network informatics, and network behavioristics,

and to outline the aims, scope and scientific foundation of these sciences, in order to lay the foundation for

further studies in the future.

2 Proposed New Sciences

2.1 Network chemistry

The aims of network chemistry are to analyze interactions between chemicals/molecules/compounds in the

complex chemical networks, to study the evolution of chemical networks, and to control these chemical

networks, etc. At the level of molecular biology and biochemistry, some of the research scopes of network

chemistry coincide with that of network biology (Goemann et al., 2011; Huang and Zhang, 2012; Li and

Network Biology   ISSN 2220­8879   URL: http://www.iaees.org/publications/journals/nb/online­version.asp RSS: http://www.iaees.org/publications/journals/nb/rss.xml E­mail: [email protected] Editor­in­Chief: WenJun Zhang Publisher: International Academy of Ecology and Environmental Sciences 

Network Biology, 2016, 6(1): 37-39

IAEES www.iaees.org

Zhang, 2013; Rahman et al., 2013). Graph theory, network science, systematics, chemistry, and computational

science, etc., are scientific foundation of network chemistry.

2.2 Network toxicology

In a sense, network toxicology is a branch of network chemistry. However, network toxicology further

considers environmental factors also. Network toxicology aims to study the mechanisms of toxicant’s flow in

the networks like ecosystems, etc., and to control the flow according to findings on these mechanisms. Graph

theory, network science, systematics, ecology/environmental sciences, health science, and computational

science, etc., are scientific foundation of network toxicology.

2.3 Network informatics

Network informatics aims to exploit the mechanisms of information dissemination in the networks like social

networks, ecosystems, the human brain, etc., and to improve the efficacy of information dissemination based

on findings on these mechanisms. Graph theory, network science, systematics, information science, and

computational science, etc., are all scientific foundation of network informatics.

2.4 Network behavioristics

Network behavioristics is closely related to sleforganizology and agent-based modeling (Zhang, 2012, 2013,

2014a, 2014b, 2016; Zhang and Liu, 2015). It aims to exploit the co-evolution mechanisms of behavioral rules

of nodes in the networks as human/animal communities, ecosystems, etc. Graph theory, network science,

systematics, agent-based modeling, selforganizology, and computational science, etc., are scientific foundation

of network behavioristics.

3 Summary

The present study just proposed the basic concepts and outlined the new sciences. The aims, scope, and

methodology, etc., of these sciences are expected to be further revised, improved and developed in the future.

Acknowledgment

We are thankful to the support of High-Quality Textbook Network Biology Project for Engineering of

Teaching Quality and Teaching Reform of Undergraduate Universities of Guangdong Province

(2015.6-2018.6), from Department of Education of Guangdong Province, and Discovery and Crucial Node

Analysis of Important Biological and Social Networks (2015.6-2020.6), from Yangling Institute of Modern

Agricultural Standardization.

References

Barabasi AL, Oltvai ZN. 2004. Network biology: understanding the cell's functional organization. Nature

Reviews Genetics, 5: 101-113

Goemann B, Wingender E, Potapov AP. 2011. Topological peculiarities of mammalian networks with

different functionalities: transcription, signal transduction and metabolic networks. Network Biology,

1(3-4): 134-148

Hopkins AL. 2007. Network pharmacology. Nature Biotechnology, 25(10): 1110-1111.

Hopkins AL. 2008. Network pharmacology: the next paradigm in durg discovery. Nature Chemical Biololgy,

4(11): 682-690

Huang JQ, Zhang WJ. 2012. Analysis on degree distribution of tumor signaling networks. Network Biology,

2(3): 95-109

38

Network Biology, 2016, 6(1): 37-39

IAEES www.iaees.org

Li JR, Zhang WJ. 2013. Identification of crucial metabolites/reactions in tumor signaling networks. Network

Biology, 3(4): 121-132

Rahman KMT, Md. Islam F, Banik RS, et al. 2013. Changes in protein interaction networks between normal

and cancer conditions: Total chaos or ordered disorder? Network Biology, 3(1): 15-28

Zhang WJ. 2011. Network Biology: an exciting frontier science. Network Biology, 1(1): 79-80

Zhang WJ. 2012. Computational Ecology: Graphs, Networks and Agent-based Modeling. World Scientific,

Singapore

Zhang WJ. 2013. Selforganizology: A science that deals with self-organization. Network Biology, 3(1):1-14

Zhang WJ. 2014a. A framework for agent-based modeling of community assembly and succession.

Selforganizology, 1(1): 16-22

Zhang WJ. 2014b. Selforganizology: A more detailed description. Selforganizology, 1(1): 31-46

Zhang WJ. 2016. Selforganizology: The Science of Self-Organization. World Scientific, Singapore

Zhang WJ, Liu GH. 2015. Coevolution: A synergy in biology and ecology. Selforganizology, 2(2): 35-38

39

Network Biology

The Network Biology (ISSN 2220-8879; CODEN NBEICS) is an open access (BOAI definition),

peer/open reviewed online journal that considers scientific articles in all different areas of network

biology. It is the transactions of the International Society of Network Biology.It dedicates to the latest

advances in network biology. The goal of this journal is to keep a record of the state-of-the-art

research and promote the research work in these fast moving areas. The topics to be covered by

Network Biology include, but are not limited to:

Theories, algorithms and programs of network analysis

Innovations and applications of biological networks

Dynamics, optimization and control of biological networks

Ecological networks, food webs and natural equilibrium

Co-evolution, co-extinction, biodiversity conservation

Metabolic networks, protein-protein interaction networks, biochemical reaction networks,

gene networks, transcriptional regulatory networks, cell cycle networks, phylogenetic

networks, network motifs

Physiological networks

Network regulation of metabolic processes, human diseases and ecological systems

Social networks, epidemiological networks

System complexity, self-organized systems, emergence of biological systems, agent-based

modeling, individual-based modeling, neural network modeling, and other network-based

modeling.

Big data analytics of biological networks, etc.

We are also interested in short communications that clearly address a specific issue or completely

present a new ecological network, food web, or metabolic or gene network, etc.

Authors can submit their works to the email box of this journal, [email protected] and(or)

[email protected]. All manuscripts submitted to Network Biology must be previously unpublished

and may not be considered for publication elsewhere at any time during review period of this journal.

In addition to free submissions from authors around the world, special issues are also accepted. The

organizer of a special issue can collect submissions (yielded from a research project, a research group,

etc.) on a specific topic, or submissions of a conference for publication of special issue.

Editorial Office: [email protected]

Publisher: International Academy of Ecology and Environmental Sciences

Address: Unit 3, 6/F., Kam Hon Industrial Building, 8 Wang Kwun Road, Kowloon Bay, Hong Kong

Tel: 00852-2138 6086

Fax: 00852-3069 1955

E-mail: [email protected]

Network Biology ISSN 2220-8879 ∣ CODEN NBEICS

Volume 6, Number 1, 1 March 2016

Articles

A node degree dependent random perturbation method for prediction

of missing links in the network

WenJun Zhang 1-11

Centrality measures for immunization of weighted networks

Mohammad Khansari, Amin Kaveh, Zainabolhoda Heshmati, et al. 12-27

Investigation of common disease regulatory network for metabolic

disorders: A bioinformatics approach

Tasnuba Jesmin, Sajjad Waheed, Abdullah-Al-Emran 28-36

Short Communication

Network chemistry, network toxicology, network informatics, and

network behavioristics: A scientific outline

WenJun Zhang 37-39

IAEES http://www.iaees.org/