an evaluation algorithm of the importance

14
An Evaluation Algorithm of the Importance of Network Node Based on Community Influence Gongzhen He (B ) , Junyong Luo, and Meijuan Yin State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China [email protected] Abstract. Identifying nodes in social networks that have great influence on infor- mation dissemination is of great significance for monitoring and guiding informa- tion dissemination. There are few methods to study the influence of communities on social networks among the existing node importance evaluation algorithms, and it is difficult to find nodes that promote information dissemination among communities. In view of this reason, this paper proposes a node importance evalu- ation algorithm based on community influence (abbreviated as IEBoCI algorithm), which evaluates the importance of the nodes based on the influence degree of the nodes on the communities and the ability to disseminate information the com- munities to which the nodes are connected. This algorithm firstly calculates the activation probability of nodes to other nodes, which is used to divide communities and evaluate influence. Secondly, the network is divided into communities based on LPA algorithm. Finally, the importance of the node is calculated by combining the influence of the community itself and the influence of the node on the commu- nity. Experiments are carried out on real social network data and compared with other community-based methods to verify the effectiveness of the algorithm. Keywords: Complex network · Social network · Node importance · Community detection · Diffusion model 1 Introduction With the rapid development of Internet and information technology, social networks such as Weibo, Facebook, Flikr, Twitter, etc. have developed rapidly. Social networks have become one of the main platform for human beings to spread information. Identifying nodes with great influence on information dissemination in social networks is helpful for in-depth analysis of information dissemination and evolution in social networks. Finding the guider or pusher in network public opinion is of great significance for controlling and guiding network public opinion, cracking down on network information crimes, and realizing viral marketing and word-of-mouth communication. There are two main methods for evaluating the importance of nodes, methods based on centralities and methods based on information dissemination scale. The centrality- based method evaluates the centrality of nodes depends on the network structure, which © Springer Nature Singapore Pte Ltd. 2020 Y. Tan et al. (Eds.): DMBD 2020, CCIS 1234, pp. 57–70, 2020. https://doi.org/10.1007/978-981-15-7205-0_6

Upload: others

Post on 05-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importanceof Network Node Based on Community Influence

Gongzhen He(B), Junyong Luo, and Meijuan Yin

State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001, [email protected]

Abstract. Identifying nodes in social networks that have great influence on infor-mation dissemination is of great significance for monitoring and guiding informa-tion dissemination. There are few methods to study the influence of communitieson social networks among the existing node importance evaluation algorithms,and it is difficult to find nodes that promote information dissemination amongcommunities. In view of this reason, this paper proposes a node importance evalu-ation algorithmbased on community influence (abbreviated as IEBoCI algorithm),which evaluates the importance of the nodes based on the influence degree of thenodes on the communities and the ability to disseminate information the com-munities to which the nodes are connected. This algorithm firstly calculates theactivation probability of nodes to other nodes, which is used to divide communitiesand evaluate influence. Secondly, the network is divided into communities basedon LPA algorithm. Finally, the importance of the node is calculated by combiningthe influence of the community itself and the influence of the node on the commu-nity. Experiments are carried out on real social network data and compared withother community-based methods to verify the effectiveness of the algorithm.

Keywords: Complex network · Social network · Node importance · Communitydetection · Diffusion model

1 Introduction

With the rapid development of Internet and information technology, social networks suchas Weibo, Facebook, Flikr, Twitter, etc. have developed rapidly. Social networks havebecome one of the main platform for human beings to spread information. Identifyingnodes with great influence on information dissemination in social networks is helpful forin-depth analysis of information dissemination and evolution in social networks. Findingthe guider or pusher in network public opinion is of great significance for controllingand guiding network public opinion, cracking down on network information crimes, andrealizing viral marketing and word-of-mouth communication.

There are two main methods for evaluating the importance of nodes, methods basedon centralities and methods based on information dissemination scale. The centrality-based method evaluates the centrality of nodes depends on the network structure, which

© Springer Nature Singapore Pte Ltd. 2020Y. Tan et al. (Eds.): DMBD 2020, CCIS 1234, pp. 57–70, 2020.https://doi.org/10.1007/978-981-15-7205-0_6

Page 2: An Evaluation Algorithm of the Importance

58 G. He et al.

represents the degree of nodes in the center of the network. These centrality methodsmainly include Degree Centrality [1], Betweenness Centrality [2], Closeness Centrality[3], Eigenvector Centrality [4], etc. This kind of method is suitable for finding theimportant nodes in the network structure, but not for evaluating the influence of nodes.

The method based on information dissemination scale is to use information dissem-ination model to simulate the information dissemination process, calculate the infor-mation dissemination scale of nodes, and find out the nodes with great influence. Agreedy algorithm for calculating the propagation scale of nodes was first proposed byKempe et al. [5], which is very time consuming and only suitable for small networks.In order to reduce the computational complexity, Leskovec et al. [6] proposed CELFalgorithm according to the submodules of influence diffusion, avoiding redundant cal-culation of activation range. References [7–11] used heuristic strategy instead of MonteCarlo simulation to estimate propagation scale for improving time efficiency. This kindof method finds influential nodes by directly measuring the information disseminationscale of nodes, but it is not suitable for evaluating the importance of nodes in the networkthat indirectly disseminate influence.

Themethods of Cao [15],Wang [16], Shang [17], Zhang [18] and other teams assumethe independence between communities. After dividing the network into communities,they find the node with the greatest local influence in each community, and then findthe node with the greatest influence in the whole network. M. M. Tulu [19] et al. cal-culated the node’s Shannon Entropy as the node’s importance by using the number ofnodes outside the community and the number of nodes inside the community after thecommunity was divided. Zhao [20] measured the importance of nodes by the number ofcommunities which the nodes connected to after dividing the network into communities.These methods either do not focus on the association between the communities, or donot consider the relationship between the nodes and the different communities, or do notconsider the influence of the communities themselves.

In order to deal with the above problems, we propose a node importance evaluationalgorithm based on community influence (abbreviated as IEBoCI algorithm). Its basicassumption is that the stronger the ability of the community connected by nodes todisseminate information and the greater the influence of nodes on the community, thehigher the importance of nodes. The algorithm first calculates the activation probabilityof nodes to other nodes; Secondly, the network is divided into communities based onLPA algorithm; Thirdly, calculate the influence of each community and the influencedegree of nodes on the connected communities; Finally, the importance of the node iscalculated by combining the influence of the community itself and the influence of thenode on the community.

2 IEBoCI Algorithm Framework

If a person has many friends in different societies in social networks, this person doesnot necessarily directly disseminate a large amount of important information, but he canindirectly disseminate the information in the community through contacts with othercommunity members. From this we can see that this person has a wide influence oninformation dissemination and plays a more important role in the network.

Page 3: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 59

We believe that the influence of nodes is related to the number and quality of commu-nities connected by nodes based on this assumption. The more communities connected,the greater the influence of the nodes and the higher the importance. Meanwhile, theinfluence of the nodes is also related to the influence of the connected communities.For the same community, the influence degree of different nodes on the community isalso different. If a node has less influence on a community, it is difficult for the nodeto influence the nodes in the community, and it is not easy to further spread informa-tion through the community. Therefore, it is necessary to comprehensively evaluate theinfluence of the community itself and the influence degree of nodes on the communitywhen evaluating the importance of nodes.

3 Algorithm Steps

Social network is a complex network, which is denoted as directed network G = (N, E)in this paper, among which N is a collection of nodes in a network, E is a set of directededges in a network. The IEBoCI algorithm proposed in this paper is based on directednetwork. The algorithm flow is shown in Fig. 1. The steps are as follows:

(1) calculate the activation probability of nodes activating their reachable nodes basedon the information propagation model, which is used to divide communities andcalculate the influence range of communities and nodes; (2) divide the networkbased on label propagation algorithm to obtain the community structure of thenetwork; (3) calculate the influence rangeof communities according to the activationprobability of the nodes; (4) calculate the number expectation of the nodes oncommunities activated by the nodes, and further obtaining the influence degree ofthe nodes on communities; (5) calculate the importance of nodes by combining theresults of the third and fourth steps.

p(1,2),p(1,3),p(2,3),p(2,4),

p(1,5),p(2,9),p(4,6),p(7,8),

Calculate activation probability from a node

to its non-adjacent reachable node

G(N,E)

Community={C1,C2,C3, }

l1,l2,l3,l4,l5,l6,

Initialize labels

l1,l1,l3,l3,l6,l6,

Update labels

Finish updating

Loop

Node Activation Probability Calculation

Community Detection Based on LPA Algorithm

Node Importance Calculation

Influence of Communities Calculation

Influence of Nodes on Communities Calculation

p(1,2),p(1,3),p(2,3),p(2,4),

Ps(C1,4),Ps(C1,5),Ps(C1,6),Ps(C1,7),

EXPs(C1),EXPs(C2),EXPs(C3),EXPs(C4),

Calculate activation

probability of communities

to nodes

Calculate influence

scale expectation of communities

p(1,2),p(1,3),p(1,5),

EXPn(vi,Cl)

Calculate the number

expectation of nodes in a community

activated by a node

I(1),I(2),I(3),

EXPn(vi,Cl) Inf(vi,Cl)

l i

i i l lC Com(v )

I(v ) Inf(v ,C )EXPs(C )

Inf(vi,Cl)Calculate

the influence degree of nodes on

communities

Fig. 1. Algorithm flow chart

3.1 Node Activation Probability Calculation

This paper calculates the information dissemination scale of nodes and communitiesbased on independent cascade model (IC model). IC model is a probability model, and

Page 4: An Evaluation Algorithm of the Importance

60 G. He et al.

there is a probability p(vi, vj) ∈ [0, 1] for all neighboring nodes vi and vj in networkG. A value between 0 and 1 is randomly assigned, which indicates the probability thatthe active node vi successfully directly activates the neighbor node vj. For non-adjacentnodes vi and vj, if vi to vj are unreachable, the probability of vi activating vj is 0, that isp(vi, vj) = 0. If vi is reachable to vj, the probability that node vi activates node vj alongthe path is the product of one node directly activating another node on each side of thepath [22]. If there are m paths between vi and vj, one of the paths is Path(vi, vj)x = <vj= v1, v2,…, vj = vk>, the probability Pp(vi, vj)x that node vi activates node vj alongthis path is calculated as follows:

Pp(vi, vj)x =k−1∏

u=1

p(vu, vu+1) (1)

Where Pp(vi, vj)x is the probability that vi activates vj through Path(vi, vj)x andp(vu, vu+1) is the probability that node vu activates neighbor node vu+1. If the activationprobabilities of different paths are different, the maximum probability is taken as theprobability p(vi, vj) for node vi to activate non-adjacent reachable node vj.

3.2 Community Detection

We divide network into communities based on label propagation algorithm (LPA algo-rithm) to obtain a collection of communities on the network. LPA algorithm is applicableto undirected and unweighted networks. The social network constructed in this paperis a directed network. Therefore, when calculating the labels to be updated of node vj,only the in-neighbors of node vj are calculated, and the labels with the highest activationprobability among the neighbors are counted. The steps are as follows:

step1: Label initialization: each node in the network is randomly assigned a unique labell, which represents the community in which the node is locatedstep2: Determining the node order of the asynchronous updating labels: calculate thedegree of the nodes, and arrange the node order of the asynchronous updating labelsfrom large to small according to the degree of the nodes;step3: Updating the labels of nodes: according to the node order of updating the labels,the labels of nodes are updated one by one, and the label of node vj is updated to thelabel with the maximum sum of activation probabilities in its in-neighbor nodes. Thelabel updating formula is as follows:

lvj = argmaxl

i∈IN (vj)

p(vi, vj)δ(lvi , l) (2)

lvj represents the label of node vj to be updated, lvi represents the label of nodevi, IN(vj) represents the set of nodes with out-edges to node vj, p(vi, vj) representsthe activation probability from node vi to node vj, and δ(li, l) is a Kronecker function.

Page 5: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 61

When there is more than one label with the maximum sum of the calculated activationprobabilities, one label is randomly selected from them as the new label of the node.

Step4: Termination judgment: it is judged whether the labels of all nodes in thenetwork are the labels with the largest sum of activation probabilities among neighbornodes. If not, step3 is repeatedly executed, if so, calculation is terminated, and nodeswith the same label belong to the same community.

3.3 Evaluation of Influence of Communities

In this paper, the number expectation of network nodes activated by a community istaken as the influence of the community. The steps are as follows: firstly, the activationprobability between nodes calculated by 3.1 is used to calculate the joint activationprobability of all nodes in the community to nodes in the network; then calculate thenumber expectation of nodes activated by the community according to the joint activationprobability to obtain the influence of the community.

In the independent cascademodel,whether a node activates another node andwhetherother nodes activate the node are independent events, so the joint activation probabilityPs(Cl, vj) [23] of the community Cl to a node vj in the network is calculated accordingto the probability multiplication of the independent events, and the calculation formulais:

Ps(Cl, vj) = 1 −∏

vi∈Cl

(1 − p(vi, vj)) (3)

With the joint activation probability Ps(Cl, vj), the influence scale expectation of thecommunity EXPs(Cl) is calculated as follows:

EXPs(Cl) =∑

vj∈NPs(Cl, vj) (4)

Where N is the set of all nodes in the network.

3.4 Evaluation of Influence of Nodes on Communities

The number expectation of nodes in a community activated by a node indicates howmany nodes in the community a node can successfully activate. The number of nodesin the community indicates the total scale of the community. The greater the proportionof nodes in the community that a node can activate in all nodes of the community, thegreater the influence of the node on the community. Therefore, this paper regards theratio of number expectation of nodes in a community activated by a node to the numberof nodes in the community as the influence degree of the node on the community.

Limiting the range of nodes activated by node vi in the community Cl, the numberexpectation of nodes EXPn(vi,Cl) in the community Cl activated by node vi is obtained,which is equal to the sum of the probabilities of each node in the community Cl beingsuccessfully activated by node vi, and the calculation formula is as follows:

EXPn(vi,Cl) =∑

vj∈Cl

p(vi, vj) (5)

Page 6: An Evaluation Algorithm of the Importance

62 G. He et al.

Where Cl represents the set of all nodes in the community Cl.EXPn(vi,Cl) represents the number of nodes that node vi can activate in community

Cl. The ratio of this expectation to the total number of nodes n(Cl) in community Cl isthe influence degree Inf (vi,Cl) of node vi on community vi. The formula is:

Inf (vi,Cl) = EXPn(vi,Cl)

n(Cl)(6)

3.5 Node Importance Evaluation

Node vi can use influence of communities directly connected by node vi to spreadinfluence indirectly. We calculate the sum of the influence of communities that node vican indirectly use, and get the importance I(vi) of node Vi. The importance I(vi) of thenode vi is calculated by the following formula using influence of communities and theinfluence of a node on community:

I(vi) =∑

Cl∈Com(vi)

Inf (vi,Cl)EXPs(Cl) (7)

Where Com(vi) represents the set of communities in which node vi and its out-neighbors are located.

4 Experimental Results and Discussions

4.1 Experimental Data and Initial Setup

The experimental data used in this paper are commonly used public social network datasets, which are downloaded from Internet. The name, data scale and description of thenetwork are shown in Table 1.

Table 1. Basic information of datasets

Data set Numberofnodes

Numberof edges

Networkdensity

Averageoutdegree

Minimumoutdegree

Maximumout degree

Description

Facebooka 4039 176468 0.0108200 43.691 1 1045 Friends onFacebook

E-Mailb 1866 5517 0.0015853 2.9566 0 330 Communicationbetween users

Note: ahttps://snap.stanford.edu/data/ego-Facebook.htmlbhttp://konect.uni-koblenz.de/networks/dnc-temporalGraph

The nodes in Facebook are friends relationship, and the formed network is an undi-rected network. In this experiment, each edge is converted into two directed edges to

Page 7: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 63

convert undirected network into directed network. The nodes in themail are communica-tion relationship. Each communication forms a directed edge from one node to anothernode.

In order to simulate the dissemination of information on the network, this paper usesnode vi as the initial activation node, and the scale of the nodes that can be affectedby node vi as a measure of the node’s information dissemination capability, which isreferred to as the Influence scale. In order to simulate the influence propagation processof nodes and calculate the influence scale of nodes, the commonly used independentcascade model (IC model) [21] is adopted in this paper.

The activation probability p between nodes is randomly assigned a value between0 and 1 when constructing a network using data sets. Due to the randomness of theindependent cascade model, the results may be different when calculating the influencescale of nodes. To sum up, each node is taken as the initial activated node to calculatethe influence scale for 50 times when calculating the influence scale of nodes, and thenthe arithmetic average value is taken as the final result.

4.2 The Division and Influence of Communities

The two data sets are divided into communities after the network construction is com-pleted, according to the activation probability between nodes by using the LPA algorithmimproved previously. The division results are shown in Tables 2 and 3.

Table 2. Communities of Facebook data sets

Communitysize

Number ofcommunities

Communitysize

Number ofcommunities

2 10 40 1

3 1 43 1

4 2 54 1

6 3 72 1

7 1 84 1

8 4 106 1

10 1 189 1

12 1 225 1

14 1 226 1

19 2 266 1

24 1 344 1

25 1 347 1

27 1 467 1

33 1 514 1

38 1 753 1

Total number of communities 46

Table 3. Communities of mail data sets

Community size Number of communities

1 490

2 27

3 5

4 3

6 1

7 1

20 1

35 2

537 1

655 1

Total number ofcommunities

532

Page 8: An Evaluation Algorithm of the Importance

64 G. He et al.

Fig. 2. Influence of communities

According to the results of community detection, it can be seen that the structuralcharacteristics of Facebook and email are quite different: Facebook has 4039 nodes and46 communities are divided, with a small number of communities and a large scaleof communities, which indicates that the network is relatively close. There are 1866nodes in the mail network, 532 of which are divided into communities. The numberof communities is large, but the number of large-scale communities is small. Morecommunities are 1 node and 2 nodes, which shows that the network is sparse.

The influence of communities is calculated after the community detection is com-pleted, according to the method proposed in this paper, and the calculation results areshown in Fig. 2. Figure 2. (a) shows the influence of communities in Facebook data. Thenumber of communities is small, and the size of communities (the number of membersof communities) varies greatly. From the overall trend, the larger the size of commu-nities, the greater the influence of communities. When the influence of the communityreaches a certain degree (the number of nodes affected reaches more than 90% of thetotal number of nodes), the increase in influence becomes less and less obvious, whichis consistent with the reality. Figure 2. (b) shows the influence of communities in E-Maildata. Generally speaking, there is also a trend that “the larger the community size, thegreater the influence of the community”. As there are a large number of 1-node and2-node communities in the network, the data in the lower left corner of the image isrelatively dense, and the influence of communities is not necessarily the same under thesame scale.

4.3 The Importance of Node

According to the method proposed in this paper, the importance I of all nodes in datasets and the Influence scale of nodes are calculated, and compared with a method ofindirectly measuring the importance of nodes through communities (the number ofdirectly connected communities V-community [20]). The Degreeout and Betweennessare analyzed as statistical data.

Distribution of Node Importance. After calculating the importance of nodes, numbersof nodes with different values of importance were counted in Facebook and E-Mail datasets. The number of nodes was counted in Facebook data set according to the importance

Page 9: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 65

with 100 as an interval, and the number of nodeswas counted inE-Mail data set accordingto the importance with 20 as an interval. The distribution of statistical results is shownin Fig. 3. The distribution in the two data sets is different. The distribution in Facebookis positively skew distribution and the distribution in mail is power law distribution.The difference between the two results lies in the different characteristics of the twodata sets. The network in Facebook data is a directed network transformed from anundirected network, and each node has an out-degree greater than 0, so each node cantransmit information to other nodes. In the mail data, the directed network is constructedaccording to the communication relationship. There are a large number of nodes in thedata set that receive mail but do not send mail, which have an out-degree of 0 and do notcarry out information dissemination to the outside. Therefore, a large number of nodeswith an out-degree of 0 result in a large number of nodes with low importance in thedata statistics. So the distribution of importance presents a power law distribution.

Fig. 3. Node importance statistics

Comparison ofNode Importance andNumber ofDirectlyConnectedCommunities.The number distribution statistics of the Influence scale of nodes in the two data sets areshown in Fig. 4.

Fig. 4. Node influence scale statistics

In Fig. 4, (a) Facebook data counts the number of nodes according to the nodeInfluence scale with 100 as an interval, and (b) E-Mail data counts the number of nodesaccording to the node Influence scale with 20 as an interval.

Page 10: An Evaluation Algorithm of the Importance

66 G. He et al.

Most of the nodes in the Facebook dataset have a large scale of influence, with 1746nodes in the [3900, 4000) interval and 1649 nodes in the [3800, 3900) interval. Facebookdataset has 4,039 nodes, of which three-quarters can affect 95% of the network. Thisresult is also related to the close connection of nodes in the data set. The density of thenetwork is high, the average outdegree of nodes is also large, and the influence spreadrange of most nodes is large.

The influence scale of most nodes in E-Mail data set is very small, 921 nodes are in[0, 20) interval. This result is related to the fact that nodes in the data set are not closelyrelated. Different nodes have different impact sizes. There are 1866 nodes in the dataset, of which 800 nodes have an output of 0. These nodes cannot transmit informationoutward, so nodes with low Influence scale account for the majority.

The relationship between node outdegree, betweenness and node influence scale isshown in Fig. 5. In the chart, the X axis represents the node outdegree and betweennessfor the corresponding data set, and the Y axis represents the node Influence scale. InFig. 5. (a) (b), it can be seen that there is almost no correlation between the node influencescale and the node output. In Fig. 5. (c) (d), it can also be seen that there is almost nocorrelation between node influence scale and node betweenness.

Fig. 5. Relationship between outdegree, betweenness and node influence scale

The relationship between number of directly connected communitie (V-community),node importance and node influence scale is shown in Fig. 6. In the chart, theX axis rep-resents V-community and node importance for the corresponding data set, and the Yaxis represents the node Influence scale. In Fig. 6. (a) (b), it can be seen that there is acertain correlation between the influence scale of nodes and the number of communitiesdirectly connected by nodes. Nodes with a large number of directly connected commu-nities have a larger influence scale, but the number of communities connected by nodes

Page 11: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 67

with a larger influence scale is not necessarily large. It can be seen from Fig. 6. (c) (d)that there is a strong correlation between node Influence scale and node importance. Theimage in Fig. 6. (c) has a larger value range of X axis and a larger image density on theleft. for convenience of observation, the distribution image in the range of importance0 to 5000 is captured, as shown in Fig. 7. As can be seen from Fig. 7, nodes with lowimportance may have a higher Influence scale, nodes with high importance have a higherInfluence scale. In Fig. 6. (d), the correlation between node influence scale and nodeimportance is more obvious, and the image is basically scattered between two obliquelines passing through the origin (oblique lines have been marked in the figure). Thereis a strong correlation between node influence scale and node importance, which showsthat the nodes with high importance we find through this method have high Influencescale.

Fig. 6. Relationship between V-community, node importance and node influence scale

Fig. 7. Node importance and node influence scale in Facebook dataset

Page 12: An Evaluation Algorithm of the Importance

68 G. He et al.

The data of the top ten nodes in Facebook data set are shown in Table 4, and thedata of the top ten nodes in mail data set are shown in Table 5.

It can be observed that for nodes with high I, the Influence scale is very high fromTable 4 and Table 5; for nodes with high I, the Degreeout is not necessarily high, andsome are even very low. I high node’s Betweenness is not necessarily high, some evenvery low; for nodes with high I, the V-community is not necessarily high, and someare even very low.

In conclusion, the influence of nodes in the network is not related to structural fea-tures such as node degree and betweenness. The method in this paper evaluates theimportance of nodes based on community influence. Nodes with higher importance

Table 4. Top ten nodes in Facebook data set in node importance

Node number Degreeout Betweenness V-community I Influence scale

107 1045 0.480518 11 12227.468 3874.939

3437 547 0.236115 14 11849.017 3622.140

563 91 0.062780 7 8008.597 3791.743

1593 32 0.000553 6 7922.979 3495.210

0 347 0.146305 11 7714.364 3700.363

1173 115 0.000942 6 7627.214 3602.769

606 91 0.000997 5 7343.892 3440.250

1687 43 0.000907 5 7064.857 3796.674

1684 792 0.337797 8 6935.021 3478.205

428 115 0.064309 7 6397.974 3582.737

Table 5. Top ten nodes of node importance in E-Mail data set

Node number Degreeout Betweenness V-community I Influence scale

1957 3 0 3 803.4425465 693.26

1159 155 0.050850 5 778.5791855 703.88

1312 6 0 5 756.1487419 647.72

993 44 0.015875 3 718.0338654 704.08

1882 4 0 4 714.651491 645.12

1669 241 0.128871 3 693.0661212 701.62

869 36 0.003789 3 683.0508954 702.42

1 96 0.025603 3 674.3651542 706.34

1618 17 0 5 667.307672 704.86

585 75 0.008709 2 653.1068147 708.86

Page 13: An Evaluation Algorithm of the Importance

An Evaluation Algorithm of the Importance of Network Node 69

have greater influence, which can promote information dissemination, and can findsome nodes with less prominent structural characteristics but larger actual influence.

5 Conclusion

Identifyingnodes that havegreater influenceon thedisseminationof information in socialnetworks is a hot research field in social networks. However, there are fewmethods in theexisting algorithms for evaluating node importance to study the influence of communitieson the dissemination of information in social networks. In this paper, a node importanceevaluation algorithm based on community influence is proposed. After the network isdivided into communities, the influence of communities and the influence of nodeson communities are evaluated. Finally, the importance of nodes is comprehensivelyevaluated by combining both. The experimental results show that the nodes with highimportance evaluated by the algorithm have high influence in the network, which canpromote the dissemination of information, and can find some nodes with high influencebut not prominent structural characteristics.

For the next step, we plan to introduce content features and user behavior featuresin social networks to integrate the importance of computing nodes.

References

1. Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Phys. Rev.Lett. 86(14), 3200–3203 (2001)

2. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1),35–41 (1977)

3. Sabidussi, G.: The centrality index of a graph. Psychometrika 31(4), 581–603 (1966)4. Bonacich, P.F.: Factoring and weighting approaches to status scores and clique identification.

J. Math. Sociol. 2(1), 113–120 (1972)5. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social

network. In: Proceedings of the 9th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, Washington DC, USA, pp. 137–146 (2003)

6. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Costeffectiveoutbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, San Jose, USA, pp. 420–429 (2007)

7. Kimura, M., Saito, K.: Tractable models for information diffusion in social networks. In:Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213,pp. 259–271. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_27

8. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Pro-ceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery andData Mining, Paris, France, pp. 199–208 (2009)

9. Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks underthe linear threshold model. In: Proceedings of IEEE 10th International Conference on DataMining (ICDM 2010), Sydney, Australia, pp. 88–97 (2010)

10. Goyal, A., Lu, W., Lakshmanan, L.V.: SIMPATH: an efficient algorithm for influence maxi-mization under the linear threshold model. In: 2011 IEEE 11th International Conference onData Mining (ICDM 2011), Vancouver, Canada, pp. 211–220 (2011)

Page 14: An Evaluation Algorithm of the Importance

70 G. He et al.

11. Kimura, M., Saito, K., Nakano, R., et al.: Extracting influential nodes on a social network forinformation diffusion. Data Min. Knowl. Discov. 20(1), 70–97 (2010)

12. Wu, X., Liu, Z.: How community structure influences epidemic spread in social networks.Phys. A: Stat. Mech. Appl. 387(2–3), 623–630 (2008)

13. Huang, W., Li, C.: Epidemic spreading in scale-free networks with community structure. J.Stat. Mech: Theory Exp. 2007(01), P01014–P01014 (2007)

14. Chu, X., Guan, J., Zhang, Z., et al.: Epidemic spreading in weighted scale-free networks withcommunity structure. J. Stat. Mech.: Theory Exp. 2009(7), P07043 (18 pp.) (2009)

15. Cao, T.,Wu,X.,Wang, S., Hu,X.: OASNET: an optimal allocation approach to influencemax-imization in modular social networks. In: Proceedings of the ACM Symposium on AppliedComputing, Sierre, Switzerland, pp. 1088–1094 (2010)

16. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, Washington, pp. 1039–1048 (2010)

17. Shang, J., Zhou, S., Li, X., et al.: CoFIM: a community-based framework for influencemaximization on large-scale networks. Knowl.-Based Syst. 117(FEB), 88–100 (2017)

18. Zhang, X., Zhu, J., Wang, Q., et al.: Identifying influential nodes in complex networks withcommunity structure. Knowl. Based Syst. 42, 74–84 (2013)

19. Tulu, M.M., Hou, R., Younas, T.: Identifying influential nodes based on community structureto speed up the dissemination of information in complex network. IEEEAccess 6, 7390–7401(2018)

20. Zhao, Z.Y., Yu, H., Zhu, Z.L., et al.: Identifying influential spreaders based on networkcommunity structure. Chin. J. Comput. 37, 753–766 (2014)

21. Goldenberg, J., Libai, B., Muller, E.: Using complex systems analysis to advance market-ing theory development: modeling heterogeneity effects on new product growth throughstochastic cellular automata. Acad. Market. Sci. Rev. 9(3), 1–18 (2001)

22. Huang, H., Shen, H., Meng, Z.: Community-based influence maximization in attributednetworks. Appl. Intell. 50(2), 354–364 (2020)

23. Li, J.,Wang,X.,Deng,K., et al.:Most influential community searchover large social networks.In: 2017 IEEE33rd InternationalConference onDataEngineering (ICDE), pp. 871–882. IEEEComputer Society (2017)