activedbc: learning knowledge-based information propagation in mobile social...

ActiveDBC: learning Knowledge-based Information propagationin mobile social networks

Jiho Park1 • Jegwang Ryu1 • Sung-Bong Yang1

� Springer Science+Business Media, LLC 2017

Abstract Due to fast-growing mobile devices usage such

as smartphones and wearable devices, the rapid informa-

tion propagation in the Mobile Social Networks environ-

ment is very important. In particular, information

transmission of people with repeated daily patterns in

complex areas such as big cities requires a very meaningful

analysis. We address the problem of identifying a key

player who can quickly propagate the information to the

whole network. This problem, in other words, often refer as

the information propagation problem. In this research, we

selected the top-k influential nodes to learn the knowledge-

based movements of people by using a Markov chain

process in a real-life environment. Subsequently, their

movement probabilities according to virtual regions were

used to ensure appropriate clustering based on the Density-

based Spatial Clustering of Applications with Noise

(DBSCAN) algorithm. Since moving patterns in a univer-

sity campus data have a dense collection of people, the

DBSCAN algorithm was useful for producing very dense

groupings. After clustering, we also elected the top-k

influential nodes based on the results learned from the

score of each node according to groups. We determined the

rate at which information spreads by using trace data from

a real network. Our experiments were conducted in the

Opportunistic Network Environment simulator. The results

showed that the proposed method has outstanding perfor-

mance for the level of spreading time in comparison to

other methods such as Naıve, Degree, and K-means. Fur-

thermore, we compared the performance of RandomDBC

with that of ActiveDBC, proving that the latter method was

important to extract the influential top-k nodes, and showed

superior performance.

Keywords DBSCAN � Markov chain � Machine learning �Real trace data � Opportunistic network environment

simulator

1 Introduction

As the usage of smart devices are increasing, applications

of Mobile Social Networks (MSNs) are becoming more

popular. The MSNs can be viewed as a type of socially-

aware Delay Tolerant Networks (DTNs), and it is

promising to provide diverse communication services by

interconnecting the usage of mobile devices and social

networks. With short range wireless communication tech-

niques such as Wi-Fi and Bluetooth, mobile users share

information via opportunistic links [1]. Since the mobile

users receive information by employing opportunistic

contacts, their mobility pattern can be a significant basis for

information propagation. However, balancing the social tie

and mobility based on intermittent and uncertain network

connectivity in MSNs is a challenging problem [2–4]. For

example, if the user changes his interests or social prefer-

ences, he would be willing to forward the information that

he is currently interested in, not the one that he is not.

Therefore, a model of analyzing the dissemination phe-

nomenon is necessary to efficiently capture the realistic

features of information propagation in MSNs. In some

& Jiho Park

[email protected]

Jegwang Ryu

[email protected]

Sung-Bong Yang

[email protected]

1 Department of Computer Science, Yonsei University, Seoul,

Korea

123

Wireless Netw

DOI 10.1007/s11276-017-1608-9

http://crossmark.crossref.org/dialog/?doi=10.1007/s11276-017-1608-9&domain=pdf

http://crossmark.crossref.org/dialog/?doi=10.1007/s11276-017-1608-9&domain=pdf

approaches [2, 5–10], the data dissemination is similar with

information propagation model, which means that one or

more special mobile users are intentionally deployed to

spread the information. The actions of special users are

controllable to facilitate connectivity among other users.

Such special users take the burden of data dissemination

away from general users, and subsequently save the limited

energy and storage resources of the whole users. Typically,

the main focus of the influential maximization problem

[3, 11–14] over the past decade has been on optimization,

which means the goal is to maximize the information

propagation through a given network, or to find the top-

k nodes with expected the degree of influence with the

social information. This property, finding special users for

information spreading, plays an important role as the

simulation time gets longer and longer. The top-k nodes

that efficiently maximize the information spread are k-in-

fluential nodes, and the method to select these nodes is the

key issue in this research.

In this paper, we introduce a new method, named Ac-

tiveDBC, for propagating information based on the col-

lected information of each node. During the initialization

period, the influential nodes are defined by the expected

number of active nodes, i.e., the extracted k-influential

nodes are in active status before spreading information to

other inactive nodes. Then, active nodes can affect other

inactive nodes until the maximum number of nodes is

active in given simulation time. The main objective of this

study is to explore the properties of each node according to

its movement patterns and to construct the groups by using

real trace data for selecting proper top-k influential nodes

based on their movement patterns. Detail phases are con-

densed into the three main points described below.

First, ActiveDBC performs Markov chain process [15]

to analyze the movements of the nodes for prediction

purposes. Considering the nature of people in real life, we

analyzed the movements of the nodes and eventually

translated them into a pattern of inheritance. Therefore, the

Markov chain is best suited to obtain the convergence

probability of each node according to the movement pat-

terns [16].

Second, based on the movement patterns analyzed by

the Markov chain, we implement efficient grouping by

applying the Density-based Spatial Clustering of Applica-

tions with Noise (DBSCAN) algorithm [17, 18]. The

results we obtained by employing this algorithm to analyze

real data enabled us to verify the performance of the

algorithm compared to other clustering methods. Although

there may be outliers, DBSCAN is an effective method for

grouping in a concentrated area that we used for our

investigation. Therefore, we attempted to empirically

reduce the number of outliers with a variety of parameters.

After the clustering procedure, we also adopted a scoring

rule based the knowledge they learned to elect the top-

k influential nodes.

Third, we implemented this approach by using the

Opportunistic Network Environment (ONE) simulator [19],

which was designed to specifically evaluate MSN [20] or

Delay Tolerant Network (DTN) [21] protocols and focuses

on the network layer without considering the details of the

physical layer. We applied our methods to realistic envi-

ronments and based the real trace data on the set of human

geotagged traces that were experimentally obtained in an

area on and around the National Cheng-Chi University

(NCCU) campus [22]. The NCCU dataset has been

appropriately applied to our proposed idea by using

DBSCAN algorithm. Our detailed contributions are as

follows:

• The proposal of a Markov chain-based model to

analyze and predict node movement patterns. Based

on real world human behaviors, node movement

patterns can be translated into patterns of inheritance

with high convergence probability.

• The DBSCAN algorithm using outlier reduction

parameters to implement efficient clustering method

on the analyzed movement patterns. After node clus-

tering, we select the top-k influential nodes based on

knowledge learned on the movement patterns.

The remainder of this paper is organized as follows. In

Sect. 2, we briefly introduce the Markov chain model and

DBSCAN algorithm with some of their characteristics. Our

ActiveDBC algorithm is presented in Sect. 3. Section 4

explains the real trace data and provides the analysis of our

experimental results. Finally, the conclusions in this paper

are presented in Sect. 5.

2 Background

2.1 Information propagation model in MSNs

In complex and time-varying networks, information prop-

agation modeling is an interesting issue for mobile social

networks. Many data dissemination methods in MSN

environments have been proposed from different perspec-

tives, such as influential maximization models including

[3, 6, 9, 11], information diffusion models including [2, 8],

and alarm dissemination models including [23]. However,

most related research in MSNs is based on data dissemi-

nation without learning nodes’ movement patterns. Nev-

ertheless, this assumption is not optimized solutions. In

everyday life, for example, movement patterns of people

correlate with their lifecycle, and they live according to

their movement patterns. Using these patterns are much

more efficient way of dealing with the data dissemination.

Wireless Netw

123

Unlike the existing work, in this paper, we assume that

movement patterns with infrastructure can be trained dur-

ing an initialization period; i.e., if we choose the top-k in-

fluential nodes for effective data dissemination based on

the pre-trained movement patterns, it will not only spread

quickly but also maximize information propagation. Note

that use of social relations (e.g., community-based) can

also be an effective way and these applications are con-

sidered in [4, 12, 13]. Unlike the existing work, in this

paper, we investigate application scenarios in which top-

k influential nodes are chosen by the pre-trained moving

patterns that propagate information in the community-

based intermittently connected infrastructure.

2.2 Markov-Chain model

Commonly, a Markov chain model can be used to represent

a discrete stochastic process. The process has meaning that

the future status of the system is only dependent upon the

system’s present state and is independent of the history of

previous events. Its model can be computed as a series of

state transitions based on certain probabilities and each

state can pass to another at each time step according to

fixed probabilities. A stochastic process whose transition

probability of a future state depends only on the present

state is defined as a first-order Markov process [15]. In a

stochastic process X tð Þ; t 2 Tf g, Markov chain model can

be expressed in the following Eq. (1).

P Xtþ1 ¼ itþ1 X0j ¼ i0;X1 ¼ i1; . . .;Xt ¼ itð Þ¼ P Xtþ1 ¼ itþ1 Xt ¼ itjð Þ

ð1Þ

where P is the conditional probability of a future event, and

it is the process state at time t. A Markov chain process is

supposed that has n possible states in certain time. And at a

given nth observation period, probability of the system

being in a particular state depends on its status at the n-1th

period. Define aij to be the probability of the system to be

in state i after it was in state j at any observation, and with

these aij we create the matrix P = aij, called a Transition

matrix. This matrix P is constructed by transition proba-

bilities, and the sum of probabilities should be 1. A typical

transition probability matrix P is defined as (2).

P ¼P11 � � � P1m

..

. . .. ..

.

Pn1 � � � Pnm

264

375;Pij � 0;

Xni¼1

Pij ¼ 1 ð2Þ

where n and m are the number of condition states, and Pij

presents the probability that any condition will pass from

state i to state j during a certain time step. In this way, if the

initial set X(0) is known, the future condition can be

obtained by (3) after several time steps.

X tð Þ ¼ X 0ð Þ � Pt ð3Þ

where, if the Markov process is ergodic, there is a unique

steady-state distribution X with positive entries.

2.3 Density-based clustering algorithm (DBSCAN)

The key function of the DBSCAN algorithm is to facilitate

the determination of arbitrary groups. This algorithm is

specially employed by a typical density-based clustering

algorithm [17]. The DBSCAN algorithm has numerous

characteristics. Firstly, it can learn clusters of random

shape, and secondly, it can distinguish noise points from

clustering groups. Lastly, it is efficient for large spatial

networks. We assume that a set of objects O with n objects,

that has as least a certain number of neighbors (minPoints)

within a specified range (epsilon e), where minPoints and

epsilon are the initial input parameters. The main idea is to

find clusters by starting from each object. However,

attempting to determine both of these parameters might not

be a trivial problem. The following definitions are used in

the DBSCAN algorithm.

Definition 1 (e-neighborhood) Nepsilon pð Þ ¼ q 2fO d p; qð Þj � eg, The e-neighborhood of an object p 2 O,

denoted as Nepsilon pð Þ, is the set of objects inside the ep-

silon around p iff the distance between objects p and q is

less or equal than the e.

Definition 2 (Core object) if a set of Nepsilon pð Þ� min-

Points, an object p is a Core object.

Definition 3 (Border object) if a set of Nepsilon qð Þ�minPoints and q is density-reachable from a Core object p,

an object q is a Border object.

Definition 4 (Noise) if a set of Nepsilon pð Þ\ minPoints

and q is not density-reachable from any Core objects, an

object q is Noise.

Definition 5 (Density-reachability) An object q is

directly density-reachable from object q, if q 2 Nepsilon pð Þand a set of Nepsilon pð Þ� minPoints.

Definition 6 (Density-connectedness) Two objects p and

q are density-connected if they are density-reachable

through a chain of connected core objects {p1; . . .; pngwhere p1 ¼ p and pn ¼ q, such that piþ1 is density-reach-

able from pi and 8i 2 1; . . .; n� 1f g.

The DBSCAN algorithm constructs clusters by ran-

domly choosing an unlabeled object p and performs the

epsilon query on p. If a set of Nepsilon pð Þ� minPoints, a

new cluster C is created and executed for all q 2 Nepsilon pð Þto expand the cluster until no core object is found. Then, an

object p and all of its density-connected objects are

Wireless Netw

123

assigned a cluster label. The algorithm terminates when all

unlabeled objects are processed to form new expanding

clusters.

3 System overview

3.1 Overview

Generally, an information propagation or diffusion prob-

lem is focused on reducing the overall time and increasing

the propagation speed to other neighbors. This requires a

similar type of group to be clustered because information

only needs to be sent once to delegates who have properties

similar to neighboring nodes. We model the relationship

between nodes with undirected graphs, G = (V, E), where

V denotes a set of all nodes and E denotes a set of links

between nodes based on contact frequency.

Before spreading information (or messages), we assume

that the whole stage is divided into two phases. The first

phase is to learn the movement information of each node,

which is called as initialization period, and the second

phase is to efficiently propagate the information based on

these learning data. In the initialization period, a server is

only concerned at this stage to predict the movement pat-

terns based on Markov chain process. Then, the DBSCAN

algorithm is applied to create a meaningful group and the

top-k influential nodes are identified through active learn-

ing. After all the above processes are completed in the

initialization period, the server distributes information of

each node to the top-k nodes. We describe and illustrate the

general information propagation model in Sect. 3.2. In

Sect. 3.3, we present the algorithms and methods behind

ActiveDBC in a step-by-step manner by utilizing a Markov

chain and DBSCAN clustering.

3.2 Information propagation model

Based on previous studies [4], The information propagation

model is similar with diffusion minimization problem and

its model can be described as follows. We assume that each

node can be either active or inactive. Active nodes are the

adopters of the information and are ready to propagate the

information to their inactive neighbors. When they contact,

the state changes from inactive to active, and only one side

is possible. The more frequently node u contacts with

neighbor node v, the more likely node v obtains informed

and becomes active state. From the social behavior point of

view, people most likely shares the information with their

best friends or frequently encounters. First, an initial set of

active nodes should be selected. When an active node

contacts inactive nodes, the inactive nodes become active

state with a probability until all the nodes become active

state. Then, the information propagation process is termi-

nated. Given an weighted graph G = (V, E), let V is the set

of all nodes, S be the initial set of active nodes, k is

expected number of total active nodes, and information

propagation process time by initially selected node set is

defined as s S;Vð Þ. An information propagation model was

formulated by the following equation:

argminS�V

s S;Vð Þ; Sj j � k: ð4Þ

Under the probabilistic information propagation model,

the contact frequency using the edge weight is quite

important factor, because it determines transition from

inactive to active state.

Figure 1 illustrates an example of the general informa-

tion propagation model at time t0 and t1. Assume that we

select the top-k influence node (in this case node p) before

time t0, and the communication range of all nodes is x.Here, nodes f, g, and q are in the inactive state at time t0,

and there is no contact among them at all. However, at time

t1, when p and q meet each other, node q changes its state

to active. On the other hand, even though f and g also meet

each other, they never undergo any change by remaining in

the inactive state. In this way, the state of any node may be

changed at any given time t.

3.3 Major steps of ActiveDBC

ActiveDBC utilizes a Markov chain process before it

applies DBSCAN clustering, because the patterns of each

node can be predicted and the same pattern fits the same

clustering group. Before propagating information to other

nodes, all these steps are performed during the initializa-

tion period. Ultimately, we select the top-k influential

nodes by using the Markov chain process and DBSCAN

clustering. Initially, we can use the Markov chain method

to learn the mobility pattern of each of the nodes to predict

Fig. 1 Information propagation model at time t0 and t1. Nodes are

labeled p, q, f, and g. The red and blue dots represent active and

inactive nodes, respectively

Wireless Netw

123

future movement paths. Generally, ActiveDBC consists of

the following major steps.

Step 1 Collecting contact information

In this step, each node sends a probing message and

timestamp ts at regular time intervals during the initial-

ization period, where s = 1…u and u is the time at which

the initialization period terminates. We assume that each

node knows the wall-clock time [24].

Figure 2 illustrates how node a creates a contact fre-

quency vector as times proceeds. In the initial state, the

vector of node a is set to zeros (0, 0, 0, 0, 0). At time t1, its

contact count automatically increases by 1 in its own state,

because it means that node a does not meet anyone. At time

t2 and t3, node a collects contact information from other

nodes. Similarly, other nodes build their own contact fre-

quency vector.

Step 2 Constructing a transition matrix P

At the given time u, the server accumulates the contact

information of all the nodes. As shown in Fig. 3, the server

generates the temporary matrix (b) from the contact fre-

quency vectors (a) of each node. In other words, (b) is a

symmetric matrix form. The server normalizes of the

temporary matrix to obtain the transition probability matrix

P (c) such that all elements are calculated by dividing the

element by the sum of the elements in the corresponding

column. For example, the first column of (c) is (0.12 0.25

0.37 0.12 0.12) because the sum of the column entries in

the probability vector should be 1. In this way, all columns

are changed by the probability values; these are multiplied

by the steady-state vector in the Markov chain process in

Sect. 2.1.

Step 3 Selecting promising probability vectors

After constructing the transition probability matrix P,

the server also generates the initial distribution X(0) of

probability vectors to obtain steady-state vectors. The

distribution X(0) consists ofN

k

� �initial probability vec-

tors whose elements are all possible combinations N � kð Þof zeros and k of 1

ks. For example, the initial distribution

5

2

� �¼ 10 where N = 5 and k = 2. That is, N is the

number of nodes and k is assumed to be selected as a

candidate of seed nodes. As the time proceeds to t, the set

X(t) = {1� i� N

k

� �xi tð Þj } of probability vectors is

computed by Eq. (5).

xj tð Þ ¼

rt1rt2

..

.

rtN

26664

37775 ð5Þ

where rtN is the probability value of all nodes at time t.

Then we use the stationary distribution of the Markov

chain to select promising probability vectors because this

depends on the initial distribution. The long-term behavior

property of the Markov chain with the transition matrix P

results in a unique probability vector q such that Pq = qPq.

The vector q is known as steady-state vector, which can be

found by solving the homogeneous linear system in

Eq. (6).

I � Pð Þq ¼ 0: ð6Þ

The vector q symbolizes the probability that each node

has the information at time t?. In Fig. 4, we can obtain the

mobility pattern probability of a location from the q vector.

For example, there are p matrix has 5 9 5, which has

location probability of each node that has visited. Then,

each node is multiplied by vector q, which is the proba-

bility of future visit. Since the first 0.260 has the highest

probability value in vector q, each node can predict the

Fig. 2 Contact frequency

vectors of node a at time t1, t2,

and t3

Wireless Netw

123

probability of moving forward in accordance with its the

highest probability values.

Step 4 Creating a graph for DBSCAN

In order to create clusters according to the mobility

patterns of the nodes, we obtained the Euclidian distances

by using steady-state vector q. According to the probability

in a vector q, each node can move with the mobility pat-

terns of its future locations. The server transforms vector q

to similarity matrix S using the Euclidian distances

between steady-state vectors viq and v jq. Our aim is to

construct a similarity matrix S by minimizing the Euclidian

distance in Eq. (7).

minimize D x; yð Þð Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXni¼1

viq � vjq

� �2

s: ð7Þ

The above-mentioned computation of the similarity

matrix S is constructed by using symmetric matrix in linear

algebra. Then, we apply the DBSCAN algorithm by using

similarity matrix S in step 5.

Step 5 Finding primitive clusters

The DBSCAN algorithm uses the Euclidian distances of

the similarity of nodes to capture the relationships among

the primitive clusters. This is the reason why the core

objects are determined according to each epsilon, which is

obtained by the Euclidian distance. Thus, primitive clusters

are created at given time t, and there are two primitive

clusters with mutual density-connectedness. Generally, two

connected primitive clusters may belong to the same

cluster. Ultimately, depending on the values of minPoints

and epsilon, any node is density-reachable from any other

object of the cluster, thus, it is also part of the cluster (due

to Definition 6 in Sect. 2.2). In other words, the probability

of belonging to the same cluster increases.

However, the DBSCAN algorithm includes outliers

because not all nodes are reachable from any other node

(due to Definition 4 in Sect. 2.2). Hence, it is very

important to determine the epsilon appropriately when the

primitive clusters are initially found. The final clustering

graph is continuously completed by repeating patterns in

the same way. Figure 5 shows the result of DBSCAN

clustering in our environment and the units is represented

as nodes in NCCU dataset. Figure 5(a) presents the results

that were obtained by using a Naıve method with random

Fig. 3 Construct transition probability matrix P

Fig. 4 Example for selecting

promising probability vectors

q using the Markov chain

process

Wireless Netw

123

movements, and Fig. 5(b) presents the result after

DBSCAN clustering with minPoints 4 and epsilon 0.1.

Step 6 Selecting k-influential nodes

After all the steps have been carried out, it is important to

extract the k-influential nodes in each cluster for very effi-

cient information propagation. The naıve approach, known as

RandomDBC, entails randomly selecting a k-influential node

of the clusters. This method randomly chooses any node

without information such as the relationship between nodes.

Thus, it is an inefficient approach because it is used without

any learning knowledge such as the strong edges from the

current cluster graph. Preliminary knowledge obtained by

learning from their strong edges and degrees, enables weak-

tie nodes or unknown nodes to be eliminated faster. There-

fore, the general idea is to learn actively among nodes in the

current cluster graph, which is considered the most effective

for choosing k-influential nodes. ActiveDBC first evaluates

the importance of the degree of each node. Then, the highest

scores are chosen as the top-k nodes compared with the

degree of surrounding nodes. At any given time t, the

knowledge-based information that was learned of each node,

denoted as score(d), is defined as in Eq. (8)

score dð Þ ¼ 2� uð Þ �Xp;q2C

1

E p; qð Þ ; ð8Þ

where � uð Þ is the degree when a node encountered another

node in the same cluster, p; q 2 C signifies that nodes

belong to the same cluster, E p; qð Þ is the number of degrees

when an encounter occurs between node p and q at time

t. The score(d) is calculated based on division of the sum of

degrees of all nodes and the current number of neighboring

nodes. This scoring rule is formulated based on learning the

knowledge-based information of all nodes. Hence, based

on the information learned earlier, the server spayed top-

k nodes with high scores. Contrary to the RandomDBC, an

ActiveDBC can be capable of effectively selecting the top-

k nodes. Thus, it is a better information propagation

method with high accuracy.

4 Experiments

4.1 Real trace data

Unlike synthetic models that have movements without

considering the everyday lives of people, model based on a

real trace consider specific places such the location of the

office, school and home, and even the time of day. Real

trace models have been reported in the past, e.g., Info-

com06 [25], Cambridge [26], and MIT Reality [27].

Although these real trace models follow human movements

through the real environment, specific groups act similarly

in that they supposedly perform the same actions or have

similar interests in the same building [28]. Since we

assume that people usually to go a certain location

depending on their interests and schedules, NCCU trace

data [22] is well adopted as a representation of real human

movements. The NCCU students are not restricted to any

specific place and are free to move around the campus. All

students in this university walk around according to their

class schedule or for some purpose. Thus, 115 students

with mobile devices running an Android app were traced

over a period of 2 weeks. Furthermore, since the NCCU

trace data was designed to be location-aware by recording

the GPS position once every 10 min, we utilized the

Fig. 5 Clustering results: a Naıve method and b DBSCAN clustering

Wireless Netw

123

location coordinates by dividing the entire map into nine

segments. Figure 6 shows a map of the area surrounding

NCCU with the location coordinates and the nine segments

into which we divided this area. These nine segments are

denoted as virtual regions and they enable us to implement

our proposed idea, because the server can determine the

position of each node according to its location coordinates.

4.2 Simulation methods

We verified our proposed method ActiveDBC by com-

paring its performance with that of other comparable

methods, namely Naıve, Degree and K-means. The fol-

lowing summarizes the methods we used for the compar-

ison given a simulation time.

• Naıve The top-k influential nodes are randomly selected

without social properties and learning knowledge.

Information is disseminated to other nodes uncondi-

tionally. Thus, this method referred to as Naıve, which

is appropriate because it is the simplest method.

• Degree This approach is very similar to the Degree-

centrality problem [29] of social networks. In order to

select the top-k influential nodes, the Degree method

compares the degree of a node with that of all the other

nodes. The weighted node with the highest degree is

chosen as an influential node. Although it is very

simple to compute, for the method is ineffective with

respect to the isolated nodes.

• K-means The most similar approach to our proposed

method. K-means clustering [30] is effective for

quickly and optimally identifying the top-k influential

nodes. The method aims to partition nodes into

k clusters in which each node belongs to the group

with the nearest mean in data mining. Initially,

K-means randomly chooses k nodes from all the nodes

and uses these as the initial means. Then, it continu-

ously finds nodes closer to the center for updating the

clusters. It also uses the Euclidean distance to measure

the distance of each node. This method is also

important for selecting the k nodes appropriately.

4.3 Simulation setup

Our simulation used the ONE simulator [19] and the map

of the area surrounding NCCU [22] to validate our pro-

posed idea.

Table 1 summarizes the parameters of the simulation

environments. The map of NCCU covers an area of

3764 m 9 3420 m, and the movement model is NCCU

trace data consisting of movement data collected from the

mobile devices of 115 students. The total data collection

period included an initialization period of 1500 s during

which the mobility pattern of each node was learned. This

Fig. 6 Data acquisition and segmentation: a Area surrounding the National Cheng-Chi University and b the nine virtual regions

Table 1 Simulation setting

Parameter (unit) Value (default)

number of nodes 115

Area (m2) 3764 9 3420

Movement model NCCU trace data

Interval of message behavior Student behavior

Top-k influential nodes 4, 5, 6, 7 (4)

Communication ranges (m) 1, 5, 10, 15, 20 (10)

epsilon e for DBSCAN 0.01, 0.1, 0.5 (0.1)

minPoints are within e 3, 4, 5 (4)

Initialization periods (s) 1500

Simulation times (s) 15,000

Wireless Netw

123

Fig. 7 Results with variation of active nodes according to communication ranges

Wireless Netw

123

Fig. 8 Results with variation of active nodes according to top-k influential nodes

Wireless Netw

123

training process was conducted by processing the Markov

chain and obtaining the clustering groups. Subsequently,

the simulation was carried out for a period of 15,000 s,

which was the information propagation time defined above

in Sect. 3. When the total simulation time is terminated,

most of node can have an active status. The communication

ranges indicate the scope for communication with other

nodes. Relatively, a smaller communication range would

be indicative of a sparse environment. However, we do not

consider the buffer size and message size of the nodes,

because our model focuses on the extent to which the

information becomes widespread.

Extensive experiments were conducted to obtain the most

optimal values for the parameters in each environment.

Especially it is important to specify appropriate values for the

variables (minPoints and epsilon) of the DBSCAN algorithm.

Note that establishing an influential node is an important

matter. Thus, we iteratively learn the scoring of each node at

the end of the initialization period for our proposed Acti-

veDBC. Additionally, we measured the percentage of active

nodes within a given simulation time and the simulations

were conducted 20 times to obtain the average results.

4.4 Simulation results

4.4.1 Communication ranges

In the experiments described in this section, the number of

active nodes was varied according to the communication

ranges because our aim was to determine the extent to which

Fig. 9 Comparison of results between RandomDBC and ActiveDBC with variation of number of active nodes

Wireless Netw

123

the information was spread among the nodes as x values, as

mentioned in Sect. 3. Figure 7(a) compares the overall per-

centage of active nodes with communication ranges of 5, 10,

15 and 20 m. The smaller communication ranges can be seen

as relatively sparse environments. This confirmed the supe-

rior performance of our proposed ActiveDBC compared to

that of other methods in reality. Conversely, the 20 m

communication range covers a high node density, because all

nodes are within easy reach of each other without requiring a

special movement pattern. Hence, all methods show similar

tendencies. Moreover, since isolated nodes are existed in the

NCCU trace model as a mobility property, inactive nodes

still existed. As we mentioned, in Fig. 7(d), we can see that

all the methods become increasingly indifferent.

4.4.2 Number of top-k influential nodes

Since we choose the top-k influential nodes by k clusters in

the DBSCAN algorithm, it is important to determine the size

of the epsilon e. In Fig. 8(a), we specified the epsilon as

0.205, 0.113, 0.024, and 0.014 as the number of k influential

nodes respectively. For example, Fig. 8(b) shows the result

with epsilon 0.205 and it has eight outliers among 115 nodes

(noise 0.06%) when there are four top-k influential nodes,

whereas the seven influential nodes have 32 outliers (noise

0.27%). Incidentally, despite the fact that the results in

Fig. 8(e) include many outliers, the reason for the superior

performance of the proposed method is that we measure how

fast the active nodes spread information to others according

to the top-k nodes without considering the number of out-

liers. As a result, when a larger number of influential nodes

initially exist, information tends to spread faster.

4.4.3 Knowledge-learning based propagation

We have proven that our proposed method, ActiveDBC is

more effective than RandomDBC after DBSCAN clustering.

Even though we predicted the mobility patterns of each node

based on the Markov chain and clustered nodes effectively

by using the DBSCAN algorithm, randomly choosing the

top-k influential nodes showed poor performance as

Fig. 9(c). Hence, applying the scoring rule we suggested, the

results in Fig. 9(a, b) represent good performance. However,

the communication range of 20 m shows a similar tendency

in Fig. 9(a). Since the coverage of each node is too large,

they are easily encountered by each other.

5 Conclusion

In this paper, in order to provide the effective information

propagation, we propose knowledge-based information

propagation method by identifying proper top-k influential

nodes. Our proposed idea is to learn the movement prob-

abilities of each node to reduce the number of isolated

nodes as much as possible based on the virtual regions. In

addition, the DBSCAN algorithm was applied to have an

effective result on denser and populated area. To select

k-influential node, two provided two major steps. First, we

predict the movement patterns of each node by using

Markov chain process, then the movement probabilities of

each node were provided to cluster with useful information.

Second, since the nodes with similar movement patterns

were clustered by using the DBSCAN algorithm, applying

the scoring rule was much better by choosing the top-

k nodes based on the learning acquaintance among nodes.

As a result, for short communication distances, the pro-

posed method was found to perform well in comparison to

the other methods. However, the shortcoming is that the

nodes tend to have similar results with bigger communi-

cation ranges, but in the context of the MSN environment,

the shorter communication ranges is more suitable for our

proposed idea. Moreover, we used real location-based

NCCU trace data to demonstrate the efficiency of this

method. The experiments on the real datasets showed that

the result was outstanding performance compared to other

methods in general.

Acknowledgements This research was supported by the Basic Sci-

ence Research Program through the National Research Foundation of

Korea (NRF) funded by the Ministry of Education, Science, and

Technology (2016R1A2B4010142).

References

1. Xu, Q., Su, Z., Zhang, K., Ren, P., & Shen, X. S. (2015). Epi-

demic information dissemination in mobile social networks with

opportunistic links. IEEE Transactions on Emerging Topics in

Computing, 3(3), 399–409.

2. Ma, H., Yang, H., Lyu, M. R., & King, I. (2008). Mining social

networks using heat diffusion processes for marketing candidates

selection. In Proceedings of the 17th ACM conference on infor-

mation and knowledge management (pp. 233–242).

3. Kempe, D., Kleinberg, J., & Tardos, E. (2003). Maximizing the

spread of influence through a social network. In: Proceedings of

ACM SIGKDD.

4. Lu, Z., Wen, Y., & Cao, G. (2014). Information diffusion in

mobile social networks: The speed perspective. In Proceedings of

IEEE INFOCOM (pp. 1932–1940).

5. Ning, T., Yang, Z., Wu, H. & Han, Z. (2013). Self-interest-drive

incentives for ad dissemination in autonomous mobile social

networks. In: Proceedings of IEEE INFOCOM.

6. Kempe, D., Kleinberg, J., & Tardos, E. (2003). Maximizing the

spread of influence through a social network. In Proceedings of

the 9th ACM SIGKDD international conference on knowledge

discovery and data mining (pp. 137–146).

7. Richardson, M., & Domingos, P. (2002). Mining knowledge-

sharing sites for viral marketing. In Proceedings of the eighth

ACM SIGKDD international conference on Knowledge discovery

and data mining (pp. 61–70).

Wireless Netw

123

8. Myers, S. A., Zhu, C., & Leskovec, J. (2012). Information dif-

fusion and external influence in networks. In Proceedings of the

18th ACM SIGKDD international conference on knowledge dis-

covery and data mining (pp. 33–41).

9. Kim, Y., Kim, J. K., Seok, J., & Du Kim, B. (2016). Information

propagation modeling in a drone network using disease epidemic

models. In 2016 Eighth international conference on ubiquitous

and future networks (ICUFN) (pp. 79-81).

10. Araniti, G., Orsino, A., Militano, L., Wang, L., & Iera, A. (2017).

Context-aware information diffusion for alerting messages in 5G

mobile social networks. IEEE Internet of Things Journal, 4(2),

427–436.

11. Chen, W., Wang, C., & Wang, Y. (2010). Scalable influence

maximization for prevalent viral marketing in large-scale social

networks. In: Proceedings of ACM SIGKDD.

12. Wang, Y., Cong, G., Song, G., & Xie, K. (2010). Community-

based greedy algorithm for mining top-k influential nodes in

mobile social networks. In: Proceedings of ACM SIGKDD.

13. Lu, Z., Wen, Y., Zhang, W., Zheng, Q., & Cao, G. (2016).

Towards information diffusion in mobile social networks. IEEE

Transactions on Mobile Computing, 15(5), 1292–1304.

14. Lu, Z., Sun, X., & La Porta, T. (2016). Cooperative data offload

in opportunistic networks: From mobile devices to Infrastructure.

arXiv preprint arXiv:1606.03493.

15. Markov chain. (2016). https://en.wikipedia.org/wiki/Markov_

chain.

16. Lee, J. K., & Hou, J. C. (2006). Modeling steady-state and

transient behaviors of user mobility: formulation, analysis, and

application. In Proceedings of the 7th ACM international sym-

posium on Mobile ad hoc networking and computing (pp. 85–96).

17. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-

based algorithm for discovering clusters in large spatial databases

with noise. In: Kdd (Vol. 96, No. 34, pp. 226–231).

18. Mai, S. T., Assent, I., & Storgaard, M. (2016, August). AnyDBC:

An Efficient Anytime Density-based Clustering Algorithm for

Very Large Complex Datasets. In Proceedings of the 22nd ACM

SIGKDD international conference on knowledge discovery and

data mining (pp. 1025–1034). ACM.

19. Keranen, A. (2008). Opportunistic network environment simu-

lator. Special Assignment report, Helsinki University of Tech-

nology, Department of Communications and Networking.

20. Conti, M., Giordano, S., May, M., & Passarella, A. (2010). From

opportunistic networks to opportunistic computing. IEEE Com-

munications Magazine, 48(9), 126–139.

21. Zhang, Z. (2006). Routing in intermittently connected mobile ad

hoc networks and delay tolerant networks: Overview and chal-

lenges. IEEE Communications Surveys and Tutorials, 8(1),

24–37.

22. Tsai, T. C., & Chan, H. H. (2015). NCCU trace: Social-network-

aware mobility trace. IEEE Communications Magazine, 53(10),

144–149.

23. Fratini, A., & Caleffi, M. (2014). Medical emergency alarm

dissemination in urban environments. Telematics and Informat-

ics, 31(3), 511–517.

24. Pelusi, L., Passarella, A., & Conti, M. (2006). Opportunistic

networking: data forwarding in disconnected mobile ad hoc

networks. IEEE Communications Magazine, 44(11), 15.

25. Srinivasan, V., Motani, M., Ooi, W.T. (2006). Analysis and

implications of student contact patterns derived from campus

schedules. In: Proceedings of ACM MobiCom, Los Angeles, CA

(pp. 86–97).

26. Hui, P. (2008). People are the network: experimental design and

evaluation of social-based forwarding algorithms, Ph.D.

Dissertation, UCAM-CL-TR-713. University of Cambridge,

Computer Laboratory.

27. Eagle, N., & Pentland, A. (2006). Reality mining: sensing com-

plex social systems. Personal and Ubiquitous Computing, 10(4),

255–268.

28. Socievole, A., De Rango, F., Caputo, A. (2014). Wireless con-

tacts, Facebook friendships and interests: analysis of a multi-layer

social network in an academic environment. In: 2014 IFIP

Wireless Days (WD).

29. Freeman, L. C. (1978). Centrality in social networks conceptual

clarification. Social Networks, 1(3), 215–239.

30. Alsabti, K., Ranka, S., & Singh, V. (1997). An efficient k-meansclustering algorithm.

Jiho Park is currently an Ph.D.

candidate in computer science at

Yonsei University in South

Korea. His research interests

include wireless sensor net-

works, machine learning, deep

learning and social network

analysis.

Jegwang Ryu is currently an

M.S. candidate in computer

science at Yonsei University in

South Korea. His research

interests include mobile social

networks, data offloading and

machine learning.

Sung-Bong Yang received his

M.S. and Ph.D. from the Dept.

of Computer Science at the

University of Oklahoma in 1986

and 1992, respectively. He has

been a professor at Yonsei

University since 1994. His

research interests include graph

algorithms, mobile computing,

machine learning and social

network analysis.

Wireless Netw

123

http://arxiv.org/abs/1606.03493

https://en.wikipedia.org/wiki/Markov_chain

https://en.wikipedia.org/wiki/Markov_chain

activedbc: learning knowledge-based information propagation in mobile social...

Documents