introduction - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/finding influential nodes in...

22
CYB-E-2018-10-2129.R1 Abstract—In order to find the nodes with better propagation ability, a large body of studies on the influence maximization problem has been conducted. Several influence spreading models and corresponding optimization algorithms have been proposed and successfully identified the infusive seeds in single isolated networks. However, as indicated by some recent studies and online materials, modern networked systems tend to have more complicated structures and multiple layers, which makes it difficult for existing seed determination techniques to deal with these networks. Thus, finding influential nodes in realistic multiplex networks remains open. Therefore, this paper aims to design an extended influence spreading model to simulate the influence diffusion process in multiplex networks, based on which a memetic algorithm is developed to find the seeds that are influential in all network layers. Experimental results on synthetic and real- world networks validate the effectiveness of the proposed algorithm. These results are helpful for identifying potential propagators in multiplex social networks, and provide solutions to analyze and gain deeper insights into networked systems. This work was supported in part by the General Program of National Natural Science Foundation of China (NSFC) under Grant 61773300 and in part by the Key Program of Fundamental Research Project of Natural Science of Shaanxi Province, China under Grant 2017JZ017. S. Wang is with School of Artificial Intelligence, Xidian University, Xi'an 710071, China. E-mail: [email protected]. J. Liu is with School of Artificial Intelligence, Xidian University, Xi'an 710071, China. E-mail: [email protected], [email protected]. For additional information regarding this paper, please contact Jing Liu (Corresponding author). Y. Jin is with the Department of Computer Science, University of Surrey, Guildford, GU2 7XH, UK. Email: [email protected]. Index Terms—Influence maximization; Optimization; Multiplex networks I. INTRODUCTION ETWORKED systems exist broadly in daily life. Great attention has been paid to analyze the dynamics and properties of networks [1 - 4]. Social networks, as a popular utilization of graph theory, are of great significance to modern society. Including Facebook, Twitter, WeChat, and Microblog, plenty of online social networks provide convenience for people to get access to the latest news and build communications with other users; furthermore, such close relation among people makes it easy to spread information, which has been taken as a powerful promotion technique by companies and advertisers [5, 6]. Compared with traditional advertising channels, such as broadcast, TV, and newspaper, people tend to pay more attention on the recommendations from their social friends; therefore, the word-of-mouth or virtual marketing becomes an efficient way for information propagation. As indicated in [7, 8], only a small set of spreaders is enough to cause cascading influence in a large area, so how to select valuable individuals to reach optimal information spreading is crucial to companies and advertisers, which is the core of influence maximization problems. N Dated back to the study in [9], the influence maximization was modeled as a probabilistic diffusion process between Finding Influential Nodes in Multiplex Networks using a Memetic Algorithm Shuai Wang, Jing Liu, Senior Member, IEEE, and Yaochu Jin, Fellow, IEEE 1

Upload: lykhanh

Post on 15-Aug-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

Abstract—In order to find the nodes with better propagation ability, a large body of studies on the influence maximization problem has been conducted. Several influence spreading models and corresponding optimization algorithms have been proposed and successfully identified the infusive seeds in single isolated networks. However, as indicated by some recent studies and online materials, modern networked systems tend to have more complicated structures and multiple layers, which makes it difficult for existing seed determination techniques to deal with these networks. Thus, finding influential nodes in realistic multiplex networks remains open. Therefore, this paper aims to design an extended influence spreading model to simulate the influence diffusion process in multiplex networks, based on which a memetic algorithm is developed to find the seeds that are influential in all network layers. Experimental results on synthetic and real-world networks validate the effectiveness of the proposed algorithm. These results are helpful for identifying potential propagators in multiplex social networks, and provide solutions to analyze and gain deeper insights into networked systems.

Index Terms—Influence maximization; Optimization; Multiplex networks

I. INTRODUCTION

ETWORKED systems exist broadly in daily life. Great attention has been paid to analyze the dynamics and

properties of networks [1 - 4]. Social networks, as a popular utilization of graph theory, are of great significance to modern society. Including Facebook, Twitter, WeChat, and Microblog, plenty of online social networks provide convenience for people to get access to the latest news and build communications with other users; furthermore, such close relation among people makes it easy to spread information, which has been taken as a powerful promotion technique by companies and advertisers [5, 6]. Compared with traditional advertising channels, such as broadcast, TV, and newspaper, people tend to pay more attention on the recommendations from their social friends; therefore, the word-of-mouth or virtual marketing becomes an efficient way for information propagation. As indicated in [7, 8], only a small set of spreaders is enough to cause cascading influence in a large area, so how to select valuable individuals to reach optimal information spreading is crucial to companies and advertisers, which is the core of influence maximization problems.

N

Dated back to the study in [9], the influence maximization was modeled as a probabilistic diffusion process between market network users and was formalized as an NP-hard optimization problem [10]. Meanwhile, several spreading models have been also proposed to simulate the diffusion process in networked systems, including the independent cascade (IC) model [10], the weighted cascade (WC) model [11], and the linear threshold (LT) model [12]. With the help of these models, optimization methods are available to determine the optimal initial seed sets to spread information with a given network structure. One possible way is the greedy algorithm designed in [10], which has been validated to be effective in finding valuable influential seeds. Considering the method proposed in [10] having some disadvantages like high computational cost and cannot deal with networks with large sizes, several improved heuristic optimization methods have been designed in [11, 13, 14] to promote the performance of seed-selection algorithms. In addition, the properties of network structures provide another way to find influential nodes. As indicated in [15, 16, 17], the degree and some other centrality measures of nodes also reflect their potentiality in diffusing information. Methods in [15 - 17] can generate seed candidates in a short time but have some limitations such as influence overlapping. From an overall perspective of networks, the community structure [18] describes the distribution of functional clusters or social groups, contributes to the selection of spreading seeds as well, such as in [19, 20], the nodes located in the core of communities are likely to have better influence ability.

Although heuristic-based methods are computationally efficient, their search capability is limited. Therefore, some population-based optimization algorithms have been designed to solve the influence maximization problem. The main difference between these two kinds of methods lies in the utilization of candidate information. Existing heuristic optimization methods, such as those proposed in [10, 11, 14, 21, 22], intend to find the optimum with the help of local information of a randomly-generated candidate, which reduces space complexity but leads to a mediocre result; on the contrary, population-based optimization methods perform search in the solution space through information exploration and exploitation among a series of generated candidates, which enhances the optimization results at the cost of increased computational costs. For example, in [23], a

This work was supported in part by the General Program of National Natural Science Foundation of China (NSFC) under Grant 61773300 and in part by the Key Program of Fundamental Research Project of Natural Science of Shaanxi Province, China under Grant 2017JZ017.

S. Wang is with School of Artificial Intelligence, Xidian University, Xi'an 710071, China. E-mail: [email protected]. Liu is with School of Artificial Intelligence, Xidian University, Xi'an 710071, China. E-mail: [email protected], [email protected]. For

additional information regarding this paper, please contact Jing Liu (Corresponding author).Y. Jin is with the Department of Computer Science, University of Surrey, Guildford, GU2 7XH, UK. Email: [email protected].

Finding Influential Nodes in Multiplex Networks using a Memetic Algorithm

Shuai Wang, Jing Liu, Senior Member, IEEE, and Yaochu Jin, Fellow, IEEE

1

Page 2: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

memetic algorithm (MA) is devised to successfully find influential seed sets in social networks, and the community partition information in networks is taken as an extra threshold to select possible seed candidates. Particle swarm optimization has been also implemented to find influential seeds in [24], with a degree-based initialization operator, particles in the algorithm update their chromosomes based on both local and global optimal information to speed up the convergence of the searching process. Tested on several influence models, the proposed algorithm in [24] showed good performance in solving the seed selection task. Based on different kinds of influencing models, these population-based optimization techniques provide solutions in finding valuable seeds in social networks and contribute to the high-speed information spreading in the modern society.

However, most of previous studies [8 - 17, 19 - 24] just focused on the selection of seeds on single isolated networked systems, and cannot deal with the influence maximization problem in networks with more complicated structures (such as with multiple layers). Although single networks have been greatly emphasized in previous studies, real networked systems tend to be constructed by two or several interdependent networks as shown in [25, 26]. Some studies and online materials revealed the urgency of analyzing the influence spreading process in multiplex networks [27 - 29] and showed the potential values of studying such a topic to human society. Taking social networking platforms as an example [30], most of us become users for more than one platform, and we can have different social relations in different kinds of social services, like Facebook, Twitter, WeChat, and so on. For advertisers, a combined propagation in diverse social networks is more valuable to the promotion of products, since we may have discrepant emphases on different social media caused by cultural or regional features. For instance, WeChat is popular among Chinese but Facebook tends to be the first communicating choice for American. Meanwhile, the seed users may have different influential abilities in different network layers. An intuitive example is that we may be unfamiliar with celebrities from other regions but know well about local activists. In this way, a seed with good influence ability in a certain network can be just ordinary in other networks. From the perspective of advertisers, the resources are limited and not so many seed members can be selected, thus, an acceptable seed selection strategy from the members in multiplex networks is important to deal with such problems. Whereas, the existing influence models together with corresponding seed selection methods mainly concentrate on solving influence spread in single networks, and cannot deal with such challenge. To the best of our knowledge, few studies have focused on the influence maximization problem on multiplex networks, and seed selection from multiplex networks remains an open issue.

As shown in previous studies [19 - 24], an influence diffusion process model should be designed at first to reduce the computational cost and provide guidance for the seed selection process when solving the information maximization problem on networked systems. Meanwhile, the determination

of influential seeds tends to be modeled as an optimization problem [10, 11, 13, 14], and an effective optimization algorithm, such as those reported in [23, 24], is decisive in completing the seed selection task. Therefore, how to model the influence diffusion process in systems with multiple layers, and how to design an efficient optimization algorithm to determine seeds in multiplex networks are two challenges we intend to address in this work.

In this paper, focusing on the influence maximization problem in multiplex networks, the existing IC model in single-layer networks has been extended to multiplex networks to estimate the influence range of selected seed set among the members in different network layers. The model also provides numerical evaluation on the influence ability of a certain seed. Furthermore, the seed selection strategy is also studied to find optimal combination of seeds. Since memetic algorithms (MAs) have been proven to be effective in solving different optimization problems [23, 31, 32], a problem-directed MA, together with several operators, is devised to find the most influential seed sets in terms of the whole multiple-network system, termed MA-IMmulti. The effectiveness of the proposed algorithm has been validated on both synthetic and real-world network data. Focusing on gaining a deeper understanding of multiplex networks, the proposed MA-IMmulti distinguish itself from many existing methods for single networks on the following aspects [10, 11, 14, 21 - 24]. First, in the initialization stage, MA-IMmulti

combines the random selection strategy utilized in [10, 11, 14, 21] and the high-degree-preferred selection strategy utilized in [22 - 24] to generate better starting points for the following operators. Meanwhile, the synthetic degree of a node and an XOR-based distance measure are considered in MA-IMmulti to select candidates with better influential potential in all the layers, which are not touched upon in previous studies. Moreover, the designed 2-step local search operator considers both the overall structural information in different network layers and the local connective information inside a specific layer in the search process to address the challenge of finding influential nodes in multiplex networks. Existing heuristic-based or population-based optimization algorithms tend to perform search with the help of local connective information, such as the community partition in [22, 23], the degree distribution [14, 21, 24]. However, such local information is not sufficient to find influential nodes in multiplex networks, as revealed in our empirical studies presented in Section IV. With the help of a full exploitation of graph information and several powerful search operators, MA-IMmulti provides considerable solutions to tackle the influence maximization problem in multiplex networks.

The major contributions of this paper are summarized as follows. First and foremost, an extended IC model suitable for multiplex networks is designed to deal with the estimation of comprehensive influence ability of seeds in different network layers. In the approximation process of this model, several realistic factors—including active percentage in the whole networked system, alterable propagation probability, and network layer weight—have been taken into consideration to

2

Page 3: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

make the model better fit real application scenarios. Then, for determining effective seeds from networked systems with multiple layers, an optimization algorithm, MA-IMmulti, is developed together with several problem-directed operators. Experimental results demonstrate the competitive performance of MA-IMmulti.

The rest of this paper is organized as follows: Section II represents the related work on influence spreading models in single networks, together with the extension into multiplex ones. The details of MA-IMmulti are shown in Section III. The experimental results on synthetic and real-world network data are reported in Section IV. Finally, Section V summarizes the work of this paper.

II. INFLUENCE SPREADING MODEL AND ITS EXTENSION

A. Spreading Model in Single NetworksA social network can be modeled as a graph G = (V, E),

where V = {1, 2, , N} is the set of N users and E = {eij | i, j V} is the set of M inter-relations between different users. Based on the social network structure, k nodes can be selected to function as the seed set S and spread the influence in the whole network, where the generated influence is denoted as (S). Several models have been proposed to simulate the influence spreading process in different scenarios. For example, the IC model proposed in [10] describes the diffusion process that the active nodes in time t have only one chance to activate its neighbors at probability p in time t + 1; the WC model proposed in [11] focuses on the changed active probability considering the degree distinction between nodes; the LT model proposed in [12] regulates a node can be activated under the condition that the total influence rate of its neighboring nodes reaches a pre-defined threshold. The IC model has been greatly emphasized in previous studies [10, 20 - 24], so we select this model to simulate the influence diffusion process in this paper.

In the IC model, there are only two types of states of a node, i.e. active or inactive; the activated nodes diffuse the influence through connected links and the inactive ones may change their states at a certain probability. The detailed influence spreading process works as follows. In the initial step, k nodes are selected as seeds and saved in the active set S. For time step t, every node in S independently intends to activate its each inactive neighbor with probability p, and saves the successfully activated nodes in a temporary set st; then the active set is updated as S = S st. The spreading process stops when all the active nodes finish the spreading operation but st

is still empty. Here (S) is decided by the number of active nodes in the whole propagation process, i.e. s0 + s1 + … + s( is the termination step).

Given a seed set S containing the initial active nodes in the network, the estimation of (S) can be modeled as a discrete optimization problem which has been proved to be NP hard [10, 23]. The Monte Carlo process provides a solution to the estimation problem by running a large number of simulations. Since it is extremely time-consuming to realize such a simulation process on large-scale networks, a fast

approximation method for influence spreading has been proposed in [33]. Instead of considering all the nodes in the network, the influence ability of seed set is estimated within its 2-hop range, which means only the activation states of seeds’ neighbors (Ns) and the neighbors of Ns are considered in the evaluation process. Details are defined as follows,

(1)

where Hs is the 1-hop neighbors of seed s (i.e. the connected neighbors of s), p(s, c) is the propagation probability between

active nodes (s) and inactive nodes (c), and is the 1-hop range of node c. denotes the overlapping influence from active nodes to the original seeds, as

. evaluates the sum of 2-hop influence made by the selected seeds in S first (in the first term), and deducts the potential redundant influence generated from one seed to other seeds (in the second and third terms).

The fast estimation approach has been proved to be effective in estimating the influence ability of selected seeds [33]; furthermore, this measure, which has much less computational cost compared with the traditional Monte Carlo process, also provides a reliable optimization guidance for the optimal seed selection searching process shown in [20, 23, 24]. However, concentrating on the influence maximization problem in single isolated networks, the existing model is helpless in dealing with the seed determination problem in multiplex networks.

B. Model Extension into Multiplex NetworksAs indicated by [25, 26], the realistic networked systems

tend to have multiple layers, and modern social networks also contain several layers [27, 28] to describe the relations between people or group from different aspects. Aiming at networks with complicated structure [25, 26, 34], how to estimate the influence diffusion ability of certain seeds and find those influential sets are still of great significance in theoretical and practical applications [27 - 30]. Different from the single-layer network, a multiplex network Gm can be represented as a graph set composed by several network layers Gm = [G1, G2, …, GL], where L is the total number of layers in Gm. Caused by the differences in structural features, the existing influence approximation method, such as Eq. (1), cannot be directly applied to multiplex networks, modifications are urgently required to deal with the dilemmas shown in [27 - 29].

There are challenges for estimating the influence ability of a seed set in multiplex networks. First, the influence spreading

3

Page 4: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

mechanism should consider the impact of multiple layers. In detail, those inactive nodes can change their status once they have been activated by the connecting active ones in single networks; but in terms of multiplex networks, the activation in only one layer tends to be segmentary for the inactive nodes, since the node may just keep inactive in other layers, the overall active percentage of all the layers is important to decide whether a node is truly active or not in multiplex networked systems. Considering such an example, we may get an advertisement of a new product when browsing the Internet, but merely this promotion tends to be inadequate to impress us; meanwhile, we can also get intensive propagandas of this product in other social medias like social networking services, TV programs, or the recommendation from friends. Combined the influence from different aspects of social life, we may really get impressed by the new product and make up mind to purchase. Furthermore, the potential influence scope of a node is hard to be estimated in multiplex networks, and the node may be important in some layers but just ordinary or negligible in others, which is another reason why the existing seed selection model cannot be applied to multiplex networks. Without a synthetic evaluation of nodal importance in all layers, the seed may work poorly in spreading information in the multiplex system. In addition, since some selected seeds can be insignificant in some layers, the corresponding active probability in the diffusion process should be adjusted based

on the detailed situation instead of always keeping steady as in [23, 24]. An intuitive example of a multiplex network can be found in Fig. 1.

Considering the above differences, the influence spreading mechanism should be modified based on the unique features of multiplex networks. As indicated in [25, 26], the network layers in multiplex systems are relatively independent of each other since they are responsible for different obligations. And the influence spreading operation inside the network layers is also self-contained, i.e., the active nodes have chances to activate those inactive ones at a certain probability following the operations in the IC model. Assume node a intends to activate node b, and the probability p(a, b) = p degree(a) / degree(b), where p is the pre-defined basic propagation probability. Note, however, that the true activation of nodes is conditional, so the activation in only one layer is insufficient to get an inactive node influenced, and the node should be activated in at least a couple of layers to meet the requirement for influence spread. Once each layer finishes the influence spread, a pre-defined minimal active layer Lmin is required here to decide whether a node is successfully activated in this diffusion process. The detailed determination of Lmin is problem-dependent. For example, we can set Lmin as the number of layers for multiplex networks with less than 3 layers. For those networks with more layers, we can set Lmin as 3, which was show to be effective to influence people’s decision [28, 36]. Then, the active nodes are updated and continue the spreading process until the termination criterion is satisfied. The generated influence in multiplex networks multi(S) is also decided by the number of active nodes in the whole dissemination process.

To summarize, given a social network Gm with L different layers (G1, G2, …, GL), the whole influence multi(S) generated by a seed set S is a conditional weighted accumulation of the influence made by each layer, which is defined in Eq. (2). In Eq. (2), p is the actual active percentage in the whole network, l is the predefined weights for each layer l, and is the estimated influence of the seed set S in l. Specifically, P is the fraction between the number of successfully activated nodes (which are activated in more than Lmin layeres) and the number of attempting activated nodes (which are activated in at least one layer). In estimating , the influence generated by each seed s in S is estimated within its 2-hop range, which, in general, follows the mechanism in Eq. (1). The main difference here is that the 1-hop neighbors should be independently determined for different network layers since their structures

Fig. 1. An example extracted from the multiplex Padgett social network of a corporate law partnership in the Renaissance [35]. Layer 1 represents the relations in marriage alliances, and layer 2 represents those in business relationships. If we try to select 2 seeds from the network members to reach maximum influence, nodes labeled with 9 and 2 tend to be considerable in layer 1 because of their importance in the structural connection. In terms of layer 2, node 2 still keeps significance in connection, but node 9 seems to be just mediocre, and node 6 shows the dominance for more connected neighbors; therefore, nodes 6 and 2 are potential seeds in layer 2. In this way, the selection of seeds to achieve maximal influence in the whole system is dilemmatic when just focusing on the structural information of a specific single layer, and an overall selection strategy is required here.

4

Page 5: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

(2)may vary. Meanwhile, the overlapping influence for each seed in a specific layer is estimated and deducted from the result.

Based on the influence approximation method in Eq. (1), the extended version in Eq. (2) gives consideration to the active percentage in the whole system, and the weights for different layers can be set according to the features of multiplex networks or personal preferences. In addition, the active probability is adaptive to deal with the situation that a selected seed is of high significance in one layer but negligible in another to better fit application scenarios of multiplex systems. Considering the inferior members are hard to influence those superior ones, this probability adjustment mechanism can reflect the diffusion ability of nodes in a balanced way. The effectiveness of in single networks has been validated in [33], and the proposed evaluates the overall influence through a weighed summation of influence estimation in all layers in the multiplex network. Since the functional operations are independent between layers as indicated in [25, 26], this superposition method impacts little on the influence spreading process inside network layers, and its effectiveness in influence estimation can be also guaranteed. The proposed measure in Eq. (2), on one hand, provides numerical measure to evaluate the influential ability of nodal candidates; on the other hand, can guide the optimization process in selecting appropriate seed sets.

III. MA-IMMULTI

MAs embed an individual learning procedure into evolutionary search to improve the search ability, which have found wide applications in evolutionary optimization [37]. For example in [38, 39], MAs have been implemented to find optimal design scenarios for synchronous machine and social behavior learning. In the field of complex networks, an MA was utilized to find the critical nodes [40], and several MAs have also been designed to handle different kinds of optimization problems such as detecting communities [41] and enhancing networks’ robustness [31, 32]. In this work, we focus on solving the influence maximization problem in multiplex networks with the help of MAs.

In MA-IMmulti, a problem-directed local improvement strategy has been designed, and several genetic operators, including crossover, mutation, and selection, are also adopted

to search for the influential seeds given a specific multiplex network. Candidates have better performance tend to have a greater chance to be replicated in the search process. As shown in [42 - 44], MA-IMmulti follows the procedure of a canonical MA, which belongs to the first generation of MAs. The algorithm starts with an elaboration strategy to make a broadly-distributed population, and the local search operator has been divided into two steps to further improve the optimization ability. These operators are designed based on the features of multiplex networks. The obtained results reveal the effectiveness of memetic computation in solving complex networked system problems, and these designed operators show a potential way of utilizing network structural information to guide the search process.

Several MAs have been devised recently to enhance the performance in networked systems [23, 31, 32, 41]. Most of these studies only consider single isolated networks. Inspired by these studies, several improvements have been made in the proposed MA-IMmulti. First, the population initialization operation from single networks in [31, 32] has been improved to cater to networks with multiple layers. Meanwhile, the nodal neighborhood search technique in [23, 41] is re-designed to handle the connection information from different layers. Briefly, based on several latest memetic algorithms, improvements have been made in this work to make MAs to be capable of solving optimization problems in multiplex networks.

Given a social network, the selection of seed sets with high efficiency to spread influence can be solved by an optimization algorithm, which has been emphasized in previous studies [19 - 24]. To deal with this optimization problem in multiplex networks, MA-IMmulti aims at searching for those nodes in an overall perspective of the whole system, instead of just one layer. Taking a multiplex network Gm as the input, a series of candidate seed sets are generated first to provide potential solutions for the following searching procedures. Then, genetic operators, including crossover, mutation, and local search, are conducted to find better seed composition. The details of MA-IMmulti are given in this section.

A. Seed InitializationProviding potential candidates, the initial population is

important to the performance of MAs. To select seeds in a multiplex networked system, the initialization of population should take several factors into consideration. First, different from the selection mechanism in single networks, the seeds are expected to have good spreading ability in the whole

5

Page 6: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

system, and the structural information in only one layer utilized by [20 - 24] tends to be insufficient. Then, since the overlapping influence range of possible seeds is nonnegligible, the overlapped regions between seeds are determined by the structural information from all the layers synthetically. In addition, randomly-selected nodes should be maintained in the seed sets to avoid trapping in local optima. The initialization process of MA-IMmulti is designed while considering these factors.

In MA-IMmulti, each individual is represented as a seed vector S with k nodes selected from the N nodes in the network to work as seeds. It should be noted that these k seeds are different from each other. Thus, for a clear representation, nodes in the seed set are sorted according to their label number in an increasing order. In short, S = {s1, s2, … , sk} where s1< s2< … < sk are the labels of seeds selected from the N nodes in the network.

After getting a series of seed candidates, the evaluation of their influential similarity is crucial, and those nodes with large structural similarity tend to limit the influence diffusion in the whole system. Focusing on multiplex networks, a distance measure is designed first to evaluate the similarity between nodes. For a seed candidate, the 2-hop range directly decides its influence range in a network layer, through integration of the 2-hop neighbors in all the layers, its synthetic influence range in the multiplex network can be obtained and preserved in a vector with N dimensions (N is the number of nodes in this networked system). After getting the influence range of each node, the distance can be calculated between every two nodes in the network.

The operation of XOR () is implemented here to evaluate the distance between nodes, which is defined as follows,

(3)where NBs(n), NBs(o) stands for the 2-hop neighbors in all the layers of nodes n, o, respectively. With the help of XOR operator, the distance between nodes with more diverse neighbors is higher than those with similar neighbors, and nodes with high-distance tend to avoid influential overlap and have larger influence range in the whole system.

The seed initialization process contains the following operations. The whole population is divided into three parts and each part adopts corresponding seed generation strategies. For the first part of the population, random generation is implemented, and k seeds are selected from the N nodes randomly. For the second part, considering nodes with high degree centrality having been proven to be infusive in [4, 13, 14, 24], a roulette operation is conducted here and nodes with high synthetic degree in all multiple layers have more chances to be selected as seeds. Here, the synthetic degree of a node is defined as the sum of nodal degree in each layer. In the last part, distances between nodes are considered. The first seed is randomly selected from the top-k high-degree nodes, then from the rest top-k high-degree nodes, selecting the one with the largest distance between the determined seed into the seed set; generating other seeds at random. In this way, the obtained initial population has different kinds of compound seed sets, providing good starting points for further searching.

Fig. 2. Illustration of the seed initialization process. Three seeds are selected from the multiplex network in Fig. 1, and detailed graph information can be found on the left panel. Based on the graph information, the three equal parts (labelled as A, B, and C) of the initial population P0 are shown on the right panel. In part A, all candidates are randomly generated and re-arranged in an increasing order. In part B, those nodes with a higher synthetic degree have more chances to be selected. In part C, the top-k (k is equal to three here) high-degree nodes are determined first; then, the first seed is randomly selected from the top-k nodes, the second seed is the one with the largest distance to the first seed in the rest of the top-k nodes, and the last one is randomly selected from other nodes in the network. As shown in Algorithm 1, the proposed initialization process considers different combination types of potential seeds, which provides diversified starting points for the following search process and contributes to the convergence of MA-IMmulti.

6

Page 7: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

The details of seed initialization process are given in Algorithm 1. And an intuitive example based on the multiplex network in Fig. 1 can be found in Fig. 2.

B. Genetic Operators

After getting the initial population composed of seeds selected by different generation strategies, genetic operators, including crossover, mutate and local search, are conducted on the initial population to search for better solutions.

In the crossover operator, seed information is exchanged between two selected individuals to generate more potential candidates. Considering the information exchange process ought to be finished with high efficiency and low computational complexity, the two-point crossover operation is implemented in the operator with a possibility of pc. Given two seed sets Sa and Sb from the selected individuals in the population, crossover positions p1 and p2 (p1 < p2) in the range of [1, k] are determined first, then swap the seed candidates between positions p1 and p2 of Sa and Sb to generate two new candidates Sa’ and Sb’ and save these generated individuals in a temporary population. The validity of these new candidates should be also checked. If there exist same nodes in Sa’ or Sb’, replacing the repetitive node with a random-generated one to keep the candidate applicable. Fig. 3 gives an illustration of the crossover operator.

In the mutate operator, each individual S in the population is conducted the following mutate operation with a probability pm. In the operator, a seed si (1 i k) in the individual is randomly selected, then si is replaced by another node determined by the following procedure. First, k candidate nodes are randomly selected from all nodes in the network, then get the degree and distance of these candidate nodes to si.. Make product between degree and distance of each node and choose the one with the largest product to replace si while making sure the seed set is validated with no repetitive seeds. This operator utilizes the structural information of the network, and considers the potential diffusion range through degree and the overlapping influence through distance measure in Eq. (3). Although the promotion on candidate’s influence ability is uncertain, this mutate operator contributes to escape from local optima. An illustration of the mutate

operator can be found in Fig. 4.Algorithm 1: Seed initializationInput:

Gm: Multiplex network;k: The size of seed set;pop: The size of initial population;

Output:The initialized population P0;

1: Get the degree distribution of Gm;2: for i from 1 to pop/3 do3: Randomly generate k different seeds from the nodes in

Gm and save them in the i-th individual in P0;4: end for;5: for i from pop/3 to 2/3pop do6: Get the synthetic degree of each node in the multiplex

network by summing up the degree of this node in each network layer;

7: for j from 1 to k do8: Rj = U(0, 1); /*U(0, 1) is an uniformly-distributed

random number in [0, 1]*/9: Perform the roulette selection according to the

synthetic degree of nodes, select the first node which meets the criterion Rj as seed sj and save it in the i-th individual in P0, making sure no repetitive seeds exist in the individual; otherwise, replace the repetitive one with a randomly-selected seed;

10: end for;11: end for;12: Find the k-top high-degree node in Gm, and save it in a

temporary set K;13: for i from 2/3pop to pop do14: Randomly select a node v from K and save it as s0 of the

i-th individual;15: Get the distance from v to other nodes in K based on Eq.

(3), find the one with the largest distance and save it as s1

of the i-th individual;16: Generate others seeds randomly from all the rest nodes

in Gm if necessary, and save them in the i-th individual in P0;

17: end for;

In the local search operator, operations are divided into two stages, and are conducted with probability pl. In the first stage, the local connective information of every node in the seed set of the individual is checked. In each layer, the operated seed is replaced by its highest-degree neighbor; for a multiplex network with L layers, L candidates of new seed composition can be obtained and the one with largest influence ability is kept in the mutated individual. In the second stage, the global connective information is considered, the node in the seed set with smallest synthetic degree is picked; then select a network layer L’ from the L layers, try to replace the selected seed with the highest-degree node in layer L’ while keeping the validity of the seed set. If such a replacement can improve the candidate’s influence ability, then keep the new seed set in the candidate. Searching for better candidates in the local area, this local search operator intends to improve the fitness level of the whole population; meanwhile, combined with network

Fig. 3. Illustration of the two-point crossover operator. The two seed sets Sa

and Sb exchange a part of their seed information between two randomly selected positions p1 and p2 and generate two new candidates Sa’ and Sb’. Note that there exists repetitive seed in Sa’ (Seed 3), thus, a rectification operation is performed to replace one of the repetitive seeds with another node.

7

Page 8: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

structural information, this operator mainly focuses on those nodes with high potential to become influencer and improves the efficiency of the search process. The details of local search operator are presented in Algorithm 2.

Algorithm 2: Local searchInput:

p: Candidate in the population;Gm: Multiplex network;k: The size of seed set;

Output:The operated individual p’;

Stage 1:1: for i from 1 to k do2: for l from 1 to L do3: Find the largest-degree neighbor n of seed si of p in

network layer l of Gm, replace si with n while keeping its validity and save the seed as Sl;

4: end for;5: Find the best seed set S’ with the best influence from [S1,

S2, …, Sl];

6: if > (S is the original seed set in p) then

7: p S’;8: end if;9: end for;

Stage 2:10: Find the seed s whose synthetic degree is the smallest in S

of p;11: Randomly select a layer l from 1 to L, get the largest-

degree node nl in network layer l;12: Replace s with nl while keeping its validity, and preserve

the new seed set in St;

13: if > then14: p St;15: end if;16: p’ p;

C.Framework of MA-IMmulti

In order to find the seed set with maximal influence, MA-IMmulti gets initialized by a series of broadly-distributed seed set candidates to provide a good initial population for the

further searching procedure. Then, individuals in the population exchange their seed information with each other to generate more potential candidates. For each individual, mutation and local search operators are also conducted. The former operator intends to take more nodes into consideration and help individuals escape from local optima, and the latter operator aims at improving the influence ability of each individual. Guided by the influence approximation measure

, MA-IMmulti is designed to find the optimal combination of different seed candidates. The framework of MA-IMmulti is shown in Algorithm 3.Algorithm 3: MA-IMmulti

Input:Gm: Multiplex network;k: The size of seed set;pc: Probability of conducting the crossover operator;pm: Probability of conducting the mutation operator;pl: Probability of conducting the local search operator;pop: Size of initial population;MaxGen: Maximum number of generations;

Output:S*: The best seed set found;

1: g ← 0, P0 ← Seed_initialization (k, Gm, pop);

2: Set Pt ← ;3: while g < MaxGen do4: Label all individuals in Pg as not selected;5: while not all the individuals in Pg have been selected do6: Randomly select a pair of individuals Sa and Sb have

not been selected from Pg, and label Sa and Sb as selected;

7: if U(0, 1) < pc then8: Perform Crossover (Sa, Sb), and add the generated

individuals to Pt;9: end if;

10: end while;11: for each individual p in Pg and Pt do12: if U(0, 1) < pm then13: Perform the mutate operator: Mutate (p, Gm, k);14: end if;15: end for;16: for each individual p in Pg and Pt do17: if U(0, 1) < pl then18: Perform the local search operator:

Local_search (p, Gm, k);19: end if;20: end for;21: Pg+1 ← Select the superior individuals using the

tournament selection from both Pg and Pt;22: g ← g + 1;23: end while;24: Output S* with the best influence ability in the current

population.

IV. EXPERIMENTAL RESULTS

To validate the effectiveness of the proposed influence maximization approximation method and corresponding seed selection algorithm, experimental results on synthetic and real-

Fig. 4. Illustration of the mutate operator. For the selected mutate seed si, k candidates are randomly generated first; then, the degree and their distance to si are recorded to make product, here the distance is calculated according to Eq. (3). Find the candidate with the largest product to replace si, making sure no repetitive seeds arise in S. In the example, candidate 14 has the largest degree but still fails to be selected because of the relatively close distance to si. This operator reflects a comprehensive consideration on structural information (degree) and diffusion distance.

8

Page 9: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

world multiplex networks are represented in this section. All the experiments are conducted on a PC with 3.3 GHz Intel Core i5 CPU and 8 GB RAM, and the codes are implemented in C++.

A. Experiments on synthetic networksFirst, the seed selection optimization algorithm MA-IMmulti

is tested and compared with other methods. Some existing seed selection algorithms are also tested here to make comparison, including CELF designed in [14], SHIM [21], Hybrid-IM[22],, DPSO designed in [24], traditional genetic algorithm (GA) and the selection strategy based on structural information i.e. high-degree selection and random selection. The parameter set of MA-IMmulti is given in Table I. For DPSO, most parameters are set following the settings in [24], including the learning factor c1 and c2 as 2, weight ω as 0.8, population size n as 100, and iteration number gmax as 200. For GA, the population size is as 100, iteration number as 200, probability for crossover and mutation as 0.6 and 0.4, respectively. CELF and Hybrid-IM are parameter-free, and the only input is the network structure together with desired seed set size. In SHIM, user preference ɑ is set as 0.5.

To test the performance of MA-IMmulti, several synthetic network generation models have been implemented, including scale-free (SF) networks proposed in [4] and random (ER) networks proposed in [2]. In the experiment, synthetic multiplex networks are composed of 5 layers of networks generated by different models, the network sizes are set as 500 nodes with average degree k = 4; the multiplex network with 5-layer SF networks is denoted as SFm, the one with 5-layer ER networks is denoted as ERm, and the one with 5-layer mixed networks which are randomly selected from SF and ER networks is denoted as Mixm. In the evaluation process of

, the weigh for each network is set as 1. The corresponding experimental results on these networks are represented in Fig. 5. The properties of these tested synthetic networks are shown in Table II.

TABLE ITHE PARAMETER SET OF MA-IMMULTI.

Parameter Meaning Valuepop The size of initial population 100

pcProbability of conducting crossover

operator 0.6

pmProbability of conducting mutate

operator 0.4

plProbability of conducting local search

operator 0.5

Maxgen The maximum number of generations 200

TABLE IISYNTHETIC NETWORK PROPERTIES. N STANDS FOR THE NUMBER OF NODES IN EACH LAYER, AND M STANDS FOR THE NUMBER OF LINKS

IN EACH LAYER. AST STANDS FOR AVERAGED ASSORTATIVITY [3], H STANDS FOR AVERAGED HETEROGENEITY [35], AND CLU STANDS FOR

AVERAGED CLUSTERING COEFFICIENT [36].Network Feature N M Ast H Clu

SFPower-law

degree distribution

500 1000 -0.075 0.37

9 0.038

ER Poisson degree distribution

500 1000 -0.005 0.27

1 0.006

As shown in Fig. 5, different seed selection algorithms have different performances on the generated synthetic networks. For the multiplex networked system composed of ER networks, slight difference can be found in the influence made by seeds selected by different algorithms, especially when the seed set is small. Indicated in [2, 4], ER networks have a relatively smooth degree distribution, and few nodes possess extremely high or low degree, which makes the seed selection easier in multiple networks. Based on such structural features, the advantage of the proposed algorithm is not marked, and even those randomly-selected seeds can achieve an acceptable influence. Focusing on the experiment on the multiplex network composed of SF networks, differences between the influence of seeds selected from these algorithms are more evident, and MA-IMmulti shows superiority to other methods especially when the needed size of seed set increases. SF networks have a power-law degree distribution [4, 46, 47], and only a few nodes tend to have much higher degree compared with other nodes, which makes the poor performance of random seed selection from all the nodes in the system. In the constituted multiplex network, nodes may have diverse importance in different network layers; a key node with high degree can just act as ordinary members with low degree, and vice versa. Mainly taking structural information of a specific network layer into consideration, seed sets found by DPSO, CELF, SHIM, Hybrid-IM, and GA seems to have lower influence diffusion ability. Still, SHIM which considers structural hole nodes [21] and Hybrid-IM which considers network path together with community [22] can reach better performance than traditional heuristic-based algorithm CELF. Focusing on the overall information of nodes in multiple layers, seeds determined by MA-IMmulti tend to have better performance in spreading influence. For the multiplex network composed of mixed networks, the results are medial between the previous two kinds of networks since such structure combines the features of ER and SF networks together. In short, for different kinds of synthetic multiplex networks, the proposed algorithm is superior to other existing selection algorithms and always shows the good performance to select considerable seed candidates.

To study the structural features of the selected seeds obtained by different algorithms, the analysis of the degree information of seeds is shown in Table III, where results in SFm networks with 10 seeds in Fig. 5 are taken as the example. From the results we can see that the selected seeds by MA-IMmulti tend to have more homogeneous degree centrality in the network layers to achieve averaged influence in each layer. While the seeds selected by other five algorithms tend to have higher maximal degree, but the variance between nodal degree in layers also has dramatic increase. The seeds’ nodal degree information may reflect their differences in influence diffusion ability. For the proposed algorithm, the overall structural

9

Page 10: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

information has been considered; in this way, those nodes which have extremely large degree in one layer but low degree in others can be avoided. As shown in Fig. 5, such selection strategy contributes to detect valuable node candidates and reach good performance on influence spreading in multiplex networks.

TABLE IIIDEGREE OF 10 SELECTED SEEDS FROM DIFFERENT ALGORITHMS IN

FIG. 4(B). FOR A SEED, THE DEGREE INFORMATION IN EVERY LAYER IS RECORDED; THEN THE ARISEN MAXIMAL DEGREE (MAXDEG), THE MINIMAL DEGREE (MINDEG), AND THE VARIANCE OF NODAL DEGREE

IN ALL THE LAYERS (VAR) ARE DETERMINED. THE NUMERICAL RESULTS SHOWN HERE ARE AVERAGED OVER THE 10 SELECTED

SEEDS.Algorithm

Ma

Mi

Va

MA-IMm

u

l

t

i

5 6 9

DPSO

6 3 1

GA

6 3 1

CELF

6 3 1

SHIM

5 3 1

Hybrid-IM

5 3 1

TABLE IVCOMPARISON OF THE STATISTICAL RESULTS OF 10-SEED SET

SELECTED BY DIFFERENT ALGORITHMS

Network Algorithm

Averaged influence

(Wilcoxon test)

Std.

ER

MA-IMmulti 11.768 0.2248DPSO 11.562() 0.176

GA 10.696(+) 0.1124CELF 10.718(+) 0.0827SHIM 10.9012(+) 0.0855

Hybrid-IM 10.8955(+) 0.087Degree 10.706(+) 0.083Random 10.588(+) 0.497

SF

MA-IMmulti 28.1275 0.1434DPSO 27.2018(+) 0.1668

GA 24.881(+) 0.1967CELF 24.490(+) 0.276SHIM 25.532(+) 0.191

Hybrid-IM 25.671(+) 0.187Degree 23.491(+) 0.157Random 11.298(+) 1.1769

Mix

MA-IMmulti 16.615 0.1229DPSO 16.5212(+) 0.077

GA 15.8875(+) 0.2206CELF 15.4325(+) 0.1534SHIM 16.0215(+) 0.176

Hybrid-IM 15.986(+) 0.162Degree 14.5025(+) 0.257Random 10.801(+) 0.497

Furthermore, the statistical results on the 10 selected seeds in Fig. 5 are analyzed in Table IV, where the Wilcoxon rank sum tests with a significance level of ɑ = 0.05 are conducted. In this experiment, Monte Carlo simulations for 5000 times are performed to estimate the real influential ability of selected seeds, and the basic active probability p is set as 0.3 for each network. The results of MA-IMmulti are taken as the baseline, and “+” indicates that the compared algorithm is inferior to MA-IMmulti, “-” indicates the compared algorithm outperforms

10

1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

Number of seeds

2-ho

p In

fluen

ce

MA-IMmultiDPSOGACELFSHIMHybrid-IMDegreeRandom

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

Number of seeds

2-ho

p In

fluen

ce

MA-IMmultiDPSOGACELFSHIMHybrid-IMDegreeRandom

(a) ERm (b) SFm

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

Number of seeds

2-ho

p In

fluen

ce

MA-IMmultiDPSOGACELFSHIMHybrid-IMDegreeRandom

(c) Mixm

Fig. 5. The influence of different seed selection methods on several kinds of synthetic multiplex networks. The 2-hop influence of selected seeds is estimated by the method in Eq. (2). The original active probability p is set as 0.01 in the experiments. The results here are averaged over 20 independent realizations.

Page 11: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

MA-IMmulti, while “” means the two algorithms show no statistically significant difference. The obtained results confirm the conclusion in Fig. 5 that the proposed algorithm can achieve significantly better results compared with other seed selection methods on the tested multiplex networks. Meanwhile, the selected seeds can have good influence spreading ability in the Monte Carlo simulations, which also demonstrates the effectiveness of the proposed influence estimation technique in described in Eq. (2).

The runtime analyses on the tested algorithms are given in Fig. 6. It can be seen that the proposed MA-IMmulti can finish the whole searching process within an acceptable period of time.

B. Algorithm analysisTo evaluate the contribution of different operators in MA-

IMmulti, the influence spreading processes of several pruned MA-IMmulti are studied, including the initialization operator shown in Algorithm 1 and the local search operator shown in Algorithm 2. In the experiment, the pruned algorithm removing initialization stage is denoted as IMA-IMmulti, and the one removing local search stage is denoted as LMA-IMmulti. The maximal influence estimations obtained in each generation of these variants on different kinds of synthetic multiplex networks are shown in Fig. 7. The changes of best solution trend of DPSO and GA in the searching process are also shown to provide comparisons. As it can be seen from the figure, the lack of genetic operators designed in Section III indeed has dramatic impact on the seed determination process. For the initialization operator, both randomly-selected nodes and high-degree nodes are considered in the initial population to diversify starting points for the following searching procedure; the distance between seeds is also taken into account to find possible optimal combinations. Focusing on the performance of IMA-IMmulti, the maximal influence made by initial population tends to be inferior to that of MA-IMmulti

and the ultimate optimization results also get slightly decreased, which reveals the effectiveness of the initialization operator in generating potential candidates. For the local search operator, the seeds’ local connection information together with the global structural information of all network layers has been concerned, and a 2-step searching strategy has been implemented in the operator to improve the influence ability of the whole population. The performance of the algorithm without local search operator tends to have a clear degeneration for all the tested multiplex networks in Fig. 5, reflecting the vital contribution of local search operator to the algorithm.

We can also analyze the convergence of the proposed algorithm from the results in Fig. 7. From an arbitrary initial status, MA-IMmulti can gradually reach improved solutions in limited searching steps in different kinds of synthetic multiplex networks; as shown in [2, 4, 45, 47, 48], most of realistic networked systems can be simplified as the tested network models, and the obtained results have broad application potential in reality. In addition, the results in Fig. 7 show that MA-IMmulti tends to outperform other methods in finding influential seeds from multiple-layer networks. These results reveal the good convergence performance of this algorithm in solving the influence maximization problem in multiplex networked systems.

In MA-IMmulti, three parameters pc, pm, and pl for conducting different genetic operators may influence its performance as presented in Table I. In this experiment, we take SFm

multiplex networks as an example to test the sensitivity of the algorithm to different parameter settings, and detailed results are shown in Fig. 8. In Fig. 8 (a), we set pm and pl as 0.4, 0.5, respectively, and pc ranges from 0.2 to 1 with a step size of 0.2; in Fig. 8 (b), we set pc and pl as 0.6, 0.5, respectively, and pm ranges from 0.2 to 0.6 with a step size of 0.1; in Fig. 8 (c), we set pc, pm as 0.6, 0.4, respectively, and pl ranges from 0.3 to 0.7 with an interval of 0.1. From the obtained results we can see that, the proposed algorithm is relatively insensitive to the change of parameters. Taking pc as an example, a higher pc

leads to more information exchange operations between the candidates, which contributes to the convergence of the search process. However, the performance enhancement becomes little when pc is larger than 0.6. Parameters pm and pl show similar influence on the performance to pc. The recommended parameter setting is given in Table I based on our pilot studies to achieve a balance between the computation budget and the optimization performance.

11

10-2

10-1

100

101

102

103

ERm SFm Mixm

Run

ning

Tim

e (s

)

CELF SHIM Hybrid-IM MA-IMmulti DPSO GA Degree Random

Fig. 6. Runtime analysis of the compared algorithms on synthetic multiplex networks.

Page 12: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

C.Experiments on real-world networksThe performance of MA-IMmulti is also validated on two

real-world multiplex networks. The first one is a multiplex social network of a research department at Aarhus (NetAar), composed of five kinds of online and offline relationships (Facebook, Leisure, Work, Co-authorship, and Lunch) between 61 employees [47]. Another one is a multiplex co-authorship network in two different researching fields (NetCau), extracted from the free scientific repository "arXiv" between 14489 authors [48]. The corresponding experimental results are shown in Fig. 9. The parameter settings for the algorithms are the same as those in Fig. 5. For the two tested real-world multiplex networks, MA-IMmulti can find the valuable influential seed sets and also outperforms other methods. The properties of tested real-world networks are shown in Table V.

TABLE VREAL-WORLD NETWORK PROPERTIES. N STANDS FOR THE NUMBER OF NODES IN EACH LAYER, AND M STANDS FOR THE AVERAGE NUMBER

OF LINKS IN EACH LAYER. AST, H, AND CLU ARE THE SAME AS THOSE IN TABLE II.

Network Feature N M Ast H Clu

NetAar Social network 61 609 -0.355 0.73

3 0.309

NetCau Citation network

14489 24786 0.2314 0.69

9 0.147

However, in terms of the increasing range in NetAar, the results in Fig. 9 (a) are unmarked. This may be caused by the structural properties of this network. In Aar, the employee size is relatively small and the connections between these employees are in a relatively lower density; meanwhile, some of the activists tend to play key roles in several layers, so it is not hard to determine seeds. As a result, the seeds selected by different methods do not show clear distinction. But for Cau, there exists dramatic difference in the degree distribution of nodes in the two network layers, and some key hubs in layer 1 may become unimportant in layer 2 and vice versa. The seed selection strategy should consider this structural feature in finding influential members. As presented in Fig. 9 (b), the proposed algorithm achieves better results compared with all the other tested methods. At the same time, the members in Cau are located in community-resembled lattices [48], and those core nodes inside communities may have better ability to spread influence, which simplifies the seed selection problem to some extent. Considering the community allocation information, results of DPSO are also considerable as reported in the figure. Random selection performs extremely poorly in this network, which reveals that it is hard to select those meaningful members in a large-scale system without destination.

Still, the obtained results here contribute to solve realistic issues. Advertisers can make an efficient propagation strategy by introducing their products to potential influencers in priority as selected in Fig. 9 (a); readers can find the key authors and make a quick browse of the research progresses in a field based on the results in Fig. 9 (b). Meanwhile, some real dilemmas indicated in [27, 28] may also get revelations from such an influencer selection process.

V. CONCLUSIONS

The influence maximization problem is of great significance in modern society and has drawn much attention in single

12

0 40 80 120 160 200Number of generations

22

24

26

28

30

32

34

36M

axim

al 2

-hop

Influ

ence

MA-IMmultiIMA-IMmultiLMA-IMmultiDPSOGA

0 40 80 120 160 200Number of generations

55

60

65

70

75

Max

imal

2-h

op In

fluen

ce

MA-IMmultiIMA-IMmulti

LMA-IMmultiDPSOGA

(a) ERm (b) SFm

0 40 80 120 160 200Number of generations

46

48

50

52

54

56

58

60

62

Max

imal

2-h

op In

fluen

ce

MA-IMmultiIMA-IMmultiLMA-IMmultiDPSOGA

(c) Mixm

Fig. 7. The maximum influence of the selected seeds on several kinds of multiplex networks. The results are extracted from the search processes of MA-IMmulti, its variants IMA-IMmulti and LMA-IMmulti, together with DPSO and GA.

0 40 80 120 160 20056

58

60

62

64

66

68

70

72

Number of generations

Max

imal

2-h

op In

fluen

ce

pc=0.2

pc=0.4

pc=0.6pc=0.8

pc=1

0 40 80 120 160 200

56

58

60

62

64

66

68

70

72

Number of generations

Max

imal

2-h

op In

fluen

ce

pm=0.2

pm=0.3

pm=0.4pm=0.5

pm=0.6

(a) test on pc (b) test on pm

0 40 80 120 160 20056

58

60

62

64

66

68

70

72

Number of generations

Max

imal

2-h

op In

fluen

ce

pl=0.3

pl=0.4pl=0.5pl=0.6

pl=0.7

(c) test on pl

Fig. 8. Sensitivity analysis of MA-IMmulti on SFm multiplex networks. (a) represents the results of changing pc, (b) for those of changing pm, and (c) for those of changing pl.

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

Number of seeds

2-ho

p In

fluen

ce

MA-IMmultiDPSOGACELFSHIMHybrid-IMDegreeRandom

1 5 10 15 20

0

5

10

15

20

25

30

35

40

45

Number of seeds

2-ho

p In

fluen

ce

MA-IMmultiDPSOGACELFSHIMHybrid-IMDegreeRandom

(a) NetAar (b) NetCauFig. 9. The results on real-world networks. In (a), p is set as 0.1, and for layers 1 and 5 is set as 1 because of a higher connectivity density, and for the rest layers is set as 0.5. In (b), p is set as 0.01, and is set as 1 for both layers.

Page 13: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

isolated networks. Considering the limitation of single networks in reality, in this paper, we manage to study this problem on multiplex networks. Based on the existing influence spreading model in single networks, an extended version has been proposed to simulate the multiple information diffusion process; further, an approximate influence estimation of seeds in multiple networks is also given. Based on this approximation technique, a memetic algorithm MA-IMmulti with several operators is devised to find valuable seed candidates. In the experiments, the effectiveness of MA-IMmulti has been validated on synthetic and real-world multiplex networks. These obtained results verify the feasibility of the proposed algorithm in solving the influence maximization problem on multiplex networks, providing valuable candidates to deal with propagation puzzles in complicated multiple networked systems.

Besides the IC model studied in this paper, there are several other models for simulating the diffusion process in networks, such as WC model and LT model. The extensions of these models and the comparisons between different models in multiplex networks are worthwhile of studying as well. In addition, topological rewiring has been widely used in improving networks’ performances [45, 46, 49, 50], and its application in the influential promotion [51] may be of interest too.

REFERENCES

[1] R. Albert and A. L. Barabási, “Statistical mechanics of complex networks,” Rev. Mod. Phys., vol. 74, no. 1, p. 47, 2002.

[2] P. Erdős and A. Rényi, “On the evolution of random graphs,” Publications of the Mathematical Institute of the Hungarian Academy of Sciences, vol. 5, pp. 17-61, 1960.

[3] M. E. J. Newman, “Assortative mixing in networks,” Physical Review L., vol. 89, no. 20, p. 208701, 2002.

[4] A. L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509-512, 1999.

[5] B. Liu, G. Cong, Y. Zeng, D. Xu, and Y. M. Chee, “Influence spreading path and its application to the time constrained social influence maximization problem and beyond,” IEEE Trans. On Knowledge Data Engineering, vol. 26, no. 8, pp. 1904-1917, 2014.

[6] R. Bond, C. FarissJ. Jones, A. Kramer, C. Marlow, J. Settle, and J. Fowler, “A 61-million-person experiment in social influence and political mobilization,” Nature, vol. 489, no. 7415, pp. 295–298, 2010.

[7] E. Cambria, M. Grassi, A. Hussain, and C. Havasi, “Sentic computing for social media marketing,” Multimedia Tools and Application, vol. 59, pp. 557-577, 2012.

[8] Z. Yu, C. Wang, J. By, X. Wang, Y. Wu, and C. Chen, “Friend recommendation with content spread enhancement in social networks,” Information Sciences, vol. 309, pp. 102-118, 2015.

[9] P. Domingos and M.Richardson, “Mining the network value of customers,” in Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Date Mining, San Francisco, CA, pp. 57-66, 2001.

[10] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proc. 9th ACM SIGKDD Int. Conf. Knowledge Discovery and Date Mining, Washington, DC, pp. 137-146, 2003.

[11] W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization in social networks,” in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Date Mining, Paris, pp. 199-208, 2009.

[12] K. Rahimkhani, A. Aleahmad, M. Rahgozar, and A. Moeini, “A fast algorithm for finding most influential people based on the linear threshold model,” Expert System Application, vol. 42, no. 3, pp. 1353-1361, 2015.

[13] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. Vanbriesen, and N. Glance, “Cost-effective outbreak detection in networks,” in Proc.

13th ACM SIGKDD Int. Conf. Knowledge Discovery and Date Mining, San Jose, California, pp. 420-429, 2009.

[14] A. Goyal, W. Lu, and L. Lakshmanan, “CELF++: Optimizing the greedy algorithm for influence maximization in social networks,” in Proc. 20th ACM SIGKDD Int. Conf. Companion on World Wide Web, Hyderabad, India, pp. 47-48, 2011.

[15] S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, vol. 30, pp. 107-117, 1998.

[16] K. Saito, M. Kimura, K. Ohara, and H. Motoda, “Super mediator-a new centrality measure of node importance for information diffusion over social network,” Information Sciences, vol. 329, pp. 985-1000, 2016.

[17] Y. Wang, A. V. Vasilakos, Q. Jin, and J. Ma, “PPRank: Economically selecting initial users for influence maximization in social networks,” IEEE Systems Journal, vol. 11, no. 4, pp. 2279-2291, 2017.

[18] G. Mei, X. Wu, Y. Wang, M. Hu, J. Lu, and G. Chen, “Compressive-sensing-based structure identification for multilayer networks,” IEEE Trans. on Cybernetics, vol. 48, no. 2, pp. 754-764, 2018.

[19] Y. Chen, W. Zhu, W. Peng, W. Lee, and S. Lee, “CIM: Community-based influence maximization in social networks,” ACM Trans. Intelligent Systems and Technology, vol. 5, no. 2, p. 25, 2014.

[20] A. Bozorgi, S. Samet, J. Kwisthout, T. Wareham, “Community-based influence maximization in social networks under a competitive linear threshold model,” Knowledge Based System, vol. 134, pp. 149-158, 2017.

[21] J. Zhu, Y. Liu, and X. Yin, “A new structure-hole-based algorithm for influence maximization in large online social networks,” IEEE Access, vol. 5, pp. 23405-23412, 2017.

[22] Y. Ko, K. Cho, and S. Kim, “Efficient and effective influence maximization in social networks: a hybrid-approach,” Information Sciences, vol. 465, pp. 144-161, 2018.

[23] M. Gong, C. Song, C. Duan, L. Ma, and B. Shen, “An efficient memetic algorithm for influence maximization in social networks,” IEEE Computational Intelligence Magazine, vol. 11, no. 3, pp. 22-33, 2016.

[24] M. Gong, J. Yan, B. Shen, L. Ma, and Q. Cai, “Influence maximization in social networks based on discrete particle swarm optimization,” Information Sciences, vol. 367, pp. 600-614, 2016.

[25] S. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, and S. Havlin, “Catastrophic cascade of failures in interdependent networks,” Nature, vol. 464, pp. 1025-1028, 2010.

[26] G. Mei, X. Wu, Y. Wang, M. Hu, J. Lu, and G. Chen, “Compressive-sensing-based structure identification for multilayer networks,” IEEE Trans. on Cybernetics, vol. 48, no. 2, pp. 754-764, 2018.

[27] H. Zhang, D. T. Nguyen, H. Zhang, and M. T. Thai, “Least cost influence maximization across multiple social networks,” IEEE/ACM Trans. On Networking, vol. 24, no. 2, pp. 929-939, 2016.

[28] Wikipedia: Social influence, < https://en.wikipedia.org/wiki/Social_influence#Social_networks>.

[29] Influence Maximization for Social Good, < httdp://teamcore.usc.edu/people/SocialGood/index.html>.

[30] Wikipedia: Social networking service, < https://en.wikipedia.org/wiki/Social_networking_service>.

[31] X. Tang, J. Liu, and X. Hao, “Mitigate cascading failures on networks using a memetic algorithm,” Scientific Reports, vol. 6, p. 38713, 2016.

[32] M. Zhou and J. Liu, “A memetic algorithm for enhancing the robustness of scale-free networks against malicious attacks,” Physica A, vol. 410, pp. 131-143, 2014.

[33] J. Lee and C. Chung, “A fast approximation for influence maximization in large social networks,” in 23rd ACM SIGKDD Int. Conf. Companion on World Wide Web, Seoul, Korea, pp. 1157-1162 2014.

[34] S. Gómez, A. Guilera, J. Gardeñes, C. Vicente, Y. Moreno, and A. Arenas, “Diffusion dynamics on multiplex networks,” Physical Review L., vol. 110, p. 028701, 2013.

[35] J. Padgett and C. Ansell, “Robust action and the rise of the medici,” American Journal of Sociology, pp. 1259-1319, 1993.

[36] N. Christakis and J. Fowler, Connected: the Surprising Power of Our Social Networks and How They Shape Our Lives, Little Brown, 2009.

[37] X. Chen, Y. S. Ong, M. H. Lim, K. C. Tan, “A multi-facet survey on memetic computation,” IEEE Trans. on Evolutionary Computation, vol. 15, no. 5, pp. 591–607, 2011.

[38] D. Lee, S. Lee, J. Kim, C. Lee, and S. Jung, “Intelligent memetic algorithm using GA and guided MADS for the optimal design of interior PM synchronous machine,” IEEE Trans. on Magnetics, vol. 47, no, 5, pp. 1230-1233. 2011.

13

Page 14: INTRODUCTION - epubs.surrey.ac.ukepubs.surrey.ac.uk/851862/1/Finding Influential Nodes in Multiplex...  · Web viewThe model also provides numerical evaluation on the influence ability

CYB-E-2018-10-2129.R1

[39] Y. Zeng, X. Chen, Y. S. Ong, J. Tang and Y. Xiang, "Structured memetic automation for online human-like social behavior learning", IEEE Trans. on Evolutionary Computation, vol. 21, no. 1, pp. 102-115, 2017.

[40] Y. Zhou, J. Hao, and F. Glover, “Memetic search for identifying critical nodes in sparse graphs,” IEEE Trans. on Cybernetics, online, 2018.

[41] M. Gong, B. Fu, L. Jiao, and H. Du, “Memetic algorithm for community detection in networks,” Physical Review E, vol. 84, no. 5, p. 056101, 2011.

[42] Y. S. Ong, M. Li, N. Zhu, and K. Wong, “Classification of adaptive memetic algorithms: a comparative study,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 36, no, 1, pp. 141-152, 2006.

[43] J. Smith, “Coevolving memetic algorithms: a review and progress report,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 37, no, 1, pp. 6-17, 2007.

[44] A. Gupta and Y. S. Ong, Memetic Computation: The Mainspring of Knowledge Transfer in a Data-Driven Optimization Era, Springer, 2019.

[45] S. Wang and J. Liu, “Constructing robust cooperative networks using a multi-objective evolutionary algorithm,” Scientific Reports, vol. 7, p. 41600, 2017.

[46] M. Zhou and J. Liu, “A two-phase multi-objective evolutionary algorithm for enhancing the robustness of scale-free networks against multiple malicious attacks,” IEEE Trans. on Cybernetics, vol. 47, no. 2, pp. 539-552, 2017.

[47] M. Magnani, B. Micenkova, and L. Rossi, “Combinatorial analysis of multiple networks,” arXiv: 1303.4986, 2013.

[48] M. D. Domenico, A. Lancichinetti, A. Arena, and M. Rosvall, “Identifying modular flows on multilayer networks reveal highly overlapping organization in interconnected systems,” Physical Review X, vol. 5, p. 011027, 2015.

[49] S. Wang and J. Liu, “A multi-objective evolutionary algorithm for promoting the emergence of cooperation and controllable robustness on directed networks,” IEEE Trans. on Network Science and Engineering, vol. 5, no. 2, pp. 92-100, 2018.

[50] J. Wu, X. Shen, and K. Jiao, “Game-based memetic algorithm to the vertex cover of networks,” IEEE Trans. on Cybernetics, vol. 49, no. 3, pp. 974-988, 2019.

[51] F. Morone and H. A. Makse, “Influence maximization in complex networks through optimal percolation,” Nature, vol. 524, pp. 65–68, 2015.

Shuai Wang received the B.S. degree in intelligent science and technology from Xidian University, Xi'an, China in 2015. Now, he is pursuing the Ph.D. degree in circuits and systems from School of Artificial Intelligence, Xidian University. His research interests include complex networks and

evolutionary algorithms.

Jing Liu (SM’15) received the B.S. degree in computer science and technology and the Ph.D. degree in circuits and systems from Xidian University in 2000 and 2004, respectively. In 2005, she joined Xidian University as a lecture, and was promoted to a full professor in 2009. From Apr. 2007 to

Apr. 2008, she worked at The University of Queensland, Australia as a postdoctoral research fellow, and from Jul. 2009 to Jul. 2011, she worked at The University of New South Wales at the Australian Defence Force Academy as a research associate.

Now, she is a full professor in School of Artificial Intelligence, Xidian University. Her research interests include evolutionary computation, complex networks, fuzzy cognitive maps, multiagent systems, and data mining. She is the

associate editor of IEEE Trans. Evolutionary Computation. She has been the chair of Emerging Technologies Technical Committee (ETTC) of IEEE Computational Intelligence Society from 2017-2018. Please see her homepage (http://see.xidian.edu.cn/faculty/liujing/) for more information.

Yaochu Jin (M’98–SM’02–F’16) received the B.Sc., M.Sc., and Ph.D. degrees from Zhejiang University, Hangzhou, China, in 1988, 1991, and 1996, respectively, and the Dr.-Ing. degree from Ruhr University Bochum, Bochum, Germany, in 2001.

He is a Distinguished Chair, Professor in Computational Intelligence with the Department of Computer Science, University of Surrey, Guildford, U.K., where he heads the Nature Inspired Computing and Engineering Group. He was also a Finland Distinguished Professor funded by the Finnish Funding Agency for Innovation (Tekes) and a Changjiang Distinguished Visiting Professor appointed by the Ministry of Education, China. He has (co)-authored over 300 peer-reviewed journal and conference papers and been granted eight patents on evolutionary optimization. He has delivered 30 invited keynote speeches at international conferences. His current research interests include computational intelligence, computational neuroscience, computational systems biology, and nature-inspired and realworld driven problem-solving.

Dr. Jin is the recipient of the Best Paper Award of the 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, the 2015 and 2017 IEEE Computational Intelligence Magazine Outstanding Paper Award, and the 2018 IEEE Transactions on Evolutionary Computation Outstanding Paper Award. He is the Editor-in-Chief of the IEEE Transactions on Cognitive and Developmental Systems and the Co-Editor-in-Chief of Complex & Intelligent Systems. He is an IEEE Distinguished Lecturer for the period 2013–2015 and 2017–2019, and past Vice President for the Technical Activities of the IEEE Computational Intelligence Society from 2014 to 2015.

14