caf: community aware framework for large scale mobile opportunistic networks

11
CAF: Community aware framework for large scale mobile opportunistic networks q Abderrahmen Mtibaa , Khaled A. Harras Department of Computer Science, Carnegie Mellon University, Qatar article info Article history: Received 2 October 2011 Received in revised form 27 August 2012 Accepted 28 August 2012 Available online 5 September 2012 Keywords: DTN Forwarding Scalability Multi-communities Social forwarding abstract The fundamental challenge in opportunistic networking is when and how to forward a message. Rank- based forwarding, one of the most promising methods for addressing this challenge, ranks nodes based on their social profiles or contact history in order to identify the most suitable forwarders. While these forwarding techniques have demonstrated great performance trends, we observe that they fail to effi- ciently forward messages in large scale networks. In this paper, we demonstrate using real mobility traces, the weakness of existing rank-based forwarding algorithms in large scale communities. We pro- pose strategies for partitioning large communities into sub-communities based on geographic locality or social interests. We also propose exploiting particular nodes, named MultiHomed nodes, in order to dis- seminate messages across these sub-communities. We introduce CAF,a Community Aware Forwarding framework, which is designed to be integrated with state-of-the-art rank-based forwarding algorithms, in order to improve their performance in large scale networks. We use real mobility traces to evaluate our proposed techniques. Our results empirically show a delivery success rate increase of up to 40%, along with 5% to 30% improved success delivery rates compared to state-of-the-art rank-based forwarding algo- rithms; these results are obtained while incurring a marginal increase in cost which is less than 10%. We finally propose an extension of the original framework called Community Destination Aware Framework (CDAF). Assuming that the source node can determine the destination’s community, CDAF further reduces the cost of CAF by a factor of 2 while maintaining similar success rates. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction The proliferation of a new generation of powerful mobile de- vices has led to the rise of new infrastructure-less communication paradigms and applications. This new communication environ- ment is characterized by a variety of challenges such as mobility, disconnections, and energy constraints. While researchers in the area of Mobile Ad Hoc Networks [14] (MANETs) have traditionally addressed some of these challenges, their solutions fail in scenarios where end-to-end paths may not exist. Delay Tolerant Networks [13] (DTNs) in general, and opportunistic networks in particular, have attempted to address this failure through a variety of message store-carry-and-forward techniques [3,4,16,28]. The most pressing concern in these types of opportunistic networking solutions, is the fundamental problem of deciding when to forward a message and who this message should be forwarded to. Previous work has considered the availability of various types of information about the network in order to guide and improve for- warding decisions. Such information includes historical contacts with other devices [1,3,4,9,16], information about device mobility patterns [28], or information about the social interaction between participants [6,12,19]. Within these techniques, rank-based for- warding [6–8] represents one of the most promising methods to ad- dress the message forwarding challenge. These methods differ in the type of information used to rank nodes (e.g., information ac- quired during contacts [7,8], or social interaction between users [6,21,22]) as well as how this information is used to opportunisti- cally forward messages in the network. A node with a lower rank will forward messages to nodes with higher ranks. While these techniques have demonstrated great performance trends, we show in this paper, that these forwarding techniques fail to efficiently for- ward messages in large scale networks. The popularity of nodes do not necessarily scale with the network size in a way that correlates with the contact opportunities and mobility patterns of these nodes. This paper contributes to a better understanding of the weak- nesses of existing rank-based forwarding algorithms in large scale mobile opportunistic networks. We propose partitioning large scale communities into multiple sub-communities based on vari- ous common social characteristics such as locality or social inter- ests. In order to improve forwarding performance in large scale 0140-3664/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.comcom.2012.08.019 q This paper is based on ‘‘Social Forwarding in Large Scale Networks: Insights Based on Real Trace Analysis,’’ by A. Mtibaa, and K. A. Harras, which appeared in the Proceedings of the 20th IEEE International Conference on Computer Communica- tion Networks (ICCCN), Maui, HI, USA, August 2011. Ó 2011 IEEE. Corresponding author. Tel.: +974 3315 9831. E-mail addresses: [email protected] (A. Mtibaa), [email protected] (K.A. Harras). Computer Communications 36 (2013) 180–190 Contents lists available at SciVerse ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/comcom

Upload: khaled-a

Post on 02-Dec-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CAF: Community aware framework for large scale mobile opportunistic networks

Computer Communications 36 (2013) 180–190

Contents lists available at SciVerse ScienceDirect

Computer Communications

journal homepage: www.elsevier .com/ locate/comcom

CAF: Community aware framework for large scale mobile opportunistic networks q

Abderrahmen Mtibaa ⇑, Khaled A. HarrasDepartment of Computer Science, Carnegie Mellon University, Qatar

a r t i c l e i n f o

Article history:Received 2 October 2011Received in revised form 27 August 2012Accepted 28 August 2012Available online 5 September 2012

Keywords:DTNForwardingScalabilityMulti-communitiesSocial forwarding

0140-3664/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.comcom.2012.08.019

q This paper is based on ‘‘Social Forwarding in LaBased on Real Trace Analysis,’’ by A. Mtibaa, and K. A. HProceedings of the 20th IEEE International Conferenction Networks (ICCCN), Maui, HI, USA, August 2011. �⇑ Corresponding author. Tel.: +974 3315 9831.

E-mail addresses: [email protected] (A. Mtibaa), kh

a b s t r a c t

The fundamental challenge in opportunistic networking is when and how to forward a message. Rank-based forwarding, one of the most promising methods for addressing this challenge, ranks nodes basedon their social profiles or contact history in order to identify the most suitable forwarders. While theseforwarding techniques have demonstrated great performance trends, we observe that they fail to effi-ciently forward messages in large scale networks. In this paper, we demonstrate using real mobilitytraces, the weakness of existing rank-based forwarding algorithms in large scale communities. We pro-pose strategies for partitioning large communities into sub-communities based on geographic locality orsocial interests. We also propose exploiting particular nodes, named MultiHomed nodes, in order to dis-seminate messages across these sub-communities. We introduce CAF, a Community Aware Forwardingframework, which is designed to be integrated with state-of-the-art rank-based forwarding algorithms,in order to improve their performance in large scale networks. We use real mobility traces to evaluateour proposed techniques. Our results empirically show a delivery success rate increase of up to 40%, alongwith 5% to 30% improved success delivery rates compared to state-of-the-art rank-based forwarding algo-rithms; these results are obtained while incurring a marginal increase in cost which is less than 10%. Wefinally propose an extension of the original framework called Community Destination Aware Framework(CDAF). Assuming that the source node can determine the destination’s community, CDAF furtherreduces the cost of CAF by a factor of 2 while maintaining similar success rates.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

The proliferation of a new generation of powerful mobile de-vices has led to the rise of new infrastructure-less communicationparadigms and applications. This new communication environ-ment is characterized by a variety of challenges such as mobility,disconnections, and energy constraints. While researchers in thearea of Mobile Ad Hoc Networks [14] (MANETs) have traditionallyaddressed some of these challenges, their solutions fail in scenarioswhere end-to-end paths may not exist. Delay Tolerant Networks[13] (DTNs) in general, and opportunistic networks in particular,have attempted to address this failure through a variety of messagestore-carry-and-forward techniques [3,4,16,28]. The most pressingconcern in these types of opportunistic networking solutions, isthe fundamental problem of deciding when to forward a messageand who this message should be forwarded to.

ll rights reserved.

rge Scale Networks: Insightsarras, which appeared in the

e on Computer Communica-2011 IEEE.

[email protected] (K.A. Harras).

Previous work has considered the availability of various types ofinformation about the network in order to guide and improve for-warding decisions. Such information includes historical contactswith other devices [1,3,4,9,16], information about device mobilitypatterns [28], or information about the social interaction betweenparticipants [6,12,19]. Within these techniques, rank-based for-warding [6–8] represents one of the most promising methods to ad-dress the message forwarding challenge. These methods differ inthe type of information used to rank nodes (e.g., information ac-quired during contacts [7,8], or social interaction between users[6,21,22]) as well as how this information is used to opportunisti-cally forward messages in the network. A node with a lower rankwill forward messages to nodes with higher ranks. While thesetechniques have demonstrated great performance trends, we showin this paper, that these forwarding techniques fail to efficiently for-ward messages in large scale networks. The popularity of nodes donot necessarily scale with the network size in a way that correlateswith the contact opportunities and mobility patterns of these nodes.

This paper contributes to a better understanding of the weak-nesses of existing rank-based forwarding algorithms in large scalemobile opportunistic networks. We propose partitioning largescale communities into multiple sub-communities based on vari-ous common social characteristics such as locality or social inter-ests. In order to improve forwarding performance in large scale

Page 2: CAF: Community aware framework for large scale mobile opportunistic networks

A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190 181

networks, we introduce (in Section 6) CAF a Community AwareFramework which can easily be integrated with the existingrank-based forwarding algorithms. CAF uses particular nodescalled MultiHomed nodes to disseminate messages across sub-com-munities in the network. The original rank-based forwarding algo-rithm can then behave normally within a local sub-community.Besides the simplicity of CAF, it incurs a relatively negligible over-head compared to that incurred by state-of-the-art algorithms,such as BubbleRap, to compute the global node ranking in largescale networks. CAF remains a distributed forwarding algorithmand relies on local social/contact information to estimate futureforwarding opportunities.

A major contribution in our work is based on the fact that our in-sights, evaluation, and analysis of rank-based forwarding algo-rithms, as well as the CAF framework we propose, are all based onreal mobility traces. We utilize the largest data set (to the best ofour knowledge) that captures human mobility contacts in large scalenetworks in addition to the corresponding social information. Ourresults show that we obtain a delivery success rate increase ofaround 40% compared to the state-of-the-art rank-based forwardingalgorithms, while incurring a marginal increase in cost; CAF outper-forms BubbleRap and achieves 5% to 30% delivery rate improvement.

This paper adds the following contributions beyond those de-scribed in our conference version:

� We address the scalability weakness of opportunistic for-warding more generally. While [20] proposes insights forsocial based forwarding algorithms in large scale networks,this paper extends that work to study scalability issues forrank-based forwarding including social and contact basedforwarding algorithms. We investigate scalability issuesof two contact-based forwarding algorithms; Greedy [8],and FRESH [7]

� We study the overhead generated by our proposed frame-work CAF to improve the efficiency of most state-of-the-artrank-based forwarding algorithms in large scale networks.Assuming that a destination’s community can be identifiedusing social networking applications such as Facebook orsimply appended to the address (e.g., IPv6 addresses), weintroduce and evaluate an extension of the original frame-work called Community Destination Aware Framework(CDAF). CDAF reduces the cost of CAF by a factor of 2 whilekeeping the success rate unchangeable.

� Our real-trace analysis includes additional results that relyon an additional dataset called Hope08, and two contact-based ranking algorithms; Greedy, and FRESH [8]

The rest of this paper is organized as follows: Section 2 motivates ourwork by describing related work in the fields of rank-based opportu-nistic forwarding, and highlights scalability issues in large scaleopportunistic networks. In Section 3, we present our data driven ap-proach used to generate our analytical results based on real-life mea-surement traces of human mobility. We then describe, in Section 4,the characteristics of rank-based forwarding algorithms. Section 5examines the drawbacks of such forwarding algorithms in large scalenetworks. Sections 6, and 7 study and examine forwarding withinand across sub-communities in large scale networks; we presentour results and analysis based on a data driven approach usingreal-life measurement traces of human mobility in these sections.We conclude this paper and discuss our future work in Section 8.

2. Related work

Communication between nodes in mobile opportunistic net-works is intermittent in nature and end-to-end paths between a

source and a destination may never exist. Since node contactsare mostly unpredictable, scheduled relay approaches such as Mes-sage Ferrying [28] could not be effective. Replication is the mostcommon technique to maximize the number of successful mes-sages delivered. Naive forwarding protocols based on flooding areextremely inefficient since they are very costly in terms of resourceand energy consumption [23,29]. Most of the research focuses ondesigning forwarding algorithms that reduce the number of repli-cas in the network while achieving satisfactory delivery rates.

Rank-based forwarding techniques represent one of the mostpromising methods for message forwarding in opportunistic net-works. They differ in the type of information used, as well ashow it is used, in order to rank nodes in the network. We distin-guish between two types of rank-based forwarding techniques;contact-based ranking techniques [7,8,16], where information ac-quired during contacts are used to rank nodes, and social-basedranking techniques [6,11,12,22], where social interactions betweenusers are used to rank the nodes. Contact-based forwarding tech-niques maintain a set of probabilities for successful delivery toknown destinations (e.g., Prophet [16]), or towards the most pop-ular nodes in the networks (e.g., Greedy Total and FRESH Total[8]). Messages are then forwarded only if the encountered node ap-pears to have a better chance of delivering the message. However,the intuition behind social-based forwarding techniques such asPeopleRank [22] and Simbet [6] is that socially well connectednodes are better suited to forward messages towards any givendestination. PeopleRank, for instance, uses a ranking algorithm thatis inspired by Google’s PageRank [2] to guide forwarding decisionsand forward messages to higher ranked nodes.

Most of these contributions generally highlight the superior per-formance of rank-based forwarding algorithms within specificcommunities such as conferences, campuses, etc. Our goal in thispaper is to: (i) demonstrate the weakness of such algorithms inlarge scale networks and (ii) propose a framework, which can easilybe integrated with state-of-the-art rank-based forwarding algo-rithms, in order to improve the success rate in large scale networks.

Studying the scalability of forwarding algorithms in large scaleopportunistic networks have mainly focused on mobility properties[12,28,30,31]. However, connecting social characteristics of individu-als and their mobility to classify them into communities remains lar-gely unexplored. In order to interconnect isolated regions (i.e.,communities) in a large scale network, pre-scheduled relay ap-proaches, such as Message Ferrying [30] have been proposed; spe-cial mobile nodes called ‘‘ferries‘‘ aid connectivity between thenodes in the network. Since mobility is generally unpredictable inopportunistic networks, scheduled approaches could not beeffective.

Most relevant to this work is BubbleRap [12], a forwarding algo-rithm that uses contact properties of nodes to estimate node popu-larity and classify various nodes into communities. Besides thefact that the computation of node centrality and communities arededuced from contact properties and evaluated with the same con-tact trace, BubbleRap assumes that each node has a global rankingacross the whole network. We believe that such assumption is sur-realistic in a large scale environment. In our work, we use explicitand local social interactions between individuals to form communi-ties and disseminate messages across these communities. We showin our evaluation how our Community Aware Forwarding frame-work (CAF) outperforms BubbleRap in most scenarios.

3. A data driven approach

Throughout this paper, we use a data driven approach in orderto conduct our findings and analysis. We use three experimentaldata sets to support our motivation, hypothesis, and evaluation

Page 3: CAF: Community aware framework for large scale mobile opportunistic networks

Table 1Dataset properties.

Hope08 Dartmouth01 SanFrancisco11

CoNext07 Infocom06 CoNext08

Duration 3 days 3 days 3 days 3 days 3 daysMobility

detectionWiFi WiFi Bluetooth Bluetooth Bluetooth

# Nodes 414 100 27 47 22Median

inter-contact30mins 6 min 10 min 15 min 12 min

Median contacttime

90 s 160 s 240 s 150 s 120 s

182 A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190

of our proposed framework. We begin this section with an over-view of the experimental traces. Afterwards, we present the meth-odology used throughout this paper to evaluate the performance ofour forwarding algorithms.

3.1. Experimental data sets

Our analysis is based on three large scale experimental datasets. Dartmouth01 and Hope08 are two state-of-art human mobil-ity traces publicly available on CRAWDAD.1 However such data setsare collected in relatively small areas such as conferences and cam-puses. Since it is generally costly to run an experiment in very largeareas (e.g., city-wide), we artificially create the third data set thatuses the San Francisco taxi cab trace [25] coupled with three mobil-ity traces that we utilize to represent three different sub-communi-ties; Infocom06, CoNext07, and CoNext08.

Table 1 summarizes the characteristics of our following experi-mental data sets.

3.1.1. Dartmouth01We use the WiFi access network of Dartmouth campus [10].

This data set spans roughly 1300 � 1300 square meters and over160 buildings, and about 550 802.11b access points throughout.Dartmouth college covers student residences, sport infrastructures,administrative buildings, and academic buildings. The data setcontains logs of client MAC addresses, and SSIDs of access pointsas well as their positions. We assume that two nodes are able tocommunicate if they are connected simultaneously to the same ac-cess point. We use this trace to generate contacts between 100nodes in order to simulate message propagation in a pure ad hocmanner. We note that the ping-pong effect in the Dartmouth trace[10] will not affect such assumption.

3.1.2. Hope08Hope08 dataset was collected during the 7th HOPE conference.

This experiment had a huge number of participants (around 770)to collect and exchange contact information (after an explicit con-nection setup using send/receive pings). The dataset contains thelocation of participants (30 s granularity) as well as their topicsof interest in the conference. The dataset is publicly available inthe CRAWDAD database.

We assume that contact opportunities are available when twopeople are tagged in the same place during a minimum period oftime d t. In this paper, we consider d t = 2 min as a minimum con-tact duration in order to avoid using very short contact opportuni-ties for forwarding. During the conference, participants were askedto select up to five interests they may share with other conferenceattendees. These interests are used to build a social graph connect-ing all participants. Note that there are only 414 nodes connectedin the social graph as shown in Table 1. We believe that, nowadays,the world is socially connected: Facebook announced an average of3.74 degrees of separation between its members in 2011.2

3.1.3. SanFrancisco11Owing to the lack of large-scale experimental data sets, we arti-

ficially create SanFrancisco11, a data set that uses the San Franciscotaxi cab trace [26] coupled with three human mobility traces in or-der to represent three sub-communities of the San Francisco areasuch as the airport, downtown, and the sunset areas. To the bestof our knowledge, this is the largest data set that captures humanmobility contacts as well as human social properties in large scalenetworks.

1 crawdad.cs.dartmouth.edu/.2 bbc.com/news/technology-15844230.

The San Francisco taxi trace contains mobility traces of taxi cabsin San Francisco containing GPS coordinates of approximately 500taxis collected over 30 days. Each trace has the reported time andlocation for each taxi. We incorporate traces for the duration of3 days, interpolate the movement of the cabs, and then generatethe contacts between these taxis. We assume a contact has oc-curred when a taxi comes within proximity of 100 meters from an-other taxi. The contacts trace thus contains the starting time stampof when the contact has occurred and the ending time stamp ofwhen the contact has finished. It also contains the IDs of the twocabs that happened to be in contact with each other during thattime.

We artificially incorporate three real human mobility traces inthe three different areas of San Francisco city. Taxis are moving be-tween these areas. Contacts between taxis and nodes within anarea are added based on the same contact time and inter-contacttime distribution [5] of the corresponding area. We choose the fol-lowing data sets that would represent the sub-communities ofSanFrancisco11:

3.1.4. CoNext07 [19]Visitors of a conference were asked to carry a Smartphone de-

vice during 3 consecutive days with the MobiClique application in-stalled. Prior to the experiment start, each participant was asked toindicate other CoNext participants he knew or had a social connec-tion with. During the experiment, our social networking applica-tion indicated when a contact, or a contact of a contact, was inBluetooth range. This connection neighborhood was then dis-played on the user’s device who in turn could add new connectionsor delete existing connections based on the physical interactionconsequent to the application notification. Besides, the devices alsologged any other device that has been detected; the scanning per-iod was set to a scanning interval of 2 min. The CoNext07 data setwas collected on 28 Windows Mobile devices that were given to apreselected set of participants the first day of the conference. Eachdevice was used for an average of 2,2 days since people arrived anddeparted at different times.

3.1.5. Infocom06 [5]The trace was collected with 78 participants during the IEEE

Infocom 2006 conference. People were asked to carry an experi-mental device (i.e., an iMote) with them at all times. These deviceslog all contacts between experimental devices (i.e., called internalcontacts) using periodic scanning every 2 s. In addition, they logcontacts with other external Bluetooth devices (e.g., cell phones,PDAs). We are presenting results for internal contacts only in thispaper. People were also asked to fill questionnaires with theirnationalities, languages, countries, cities, academic affiliationsand topics of interest. Based on this information, we consider inthis paper three different social graphs from this experiment;based on users (i) common topics of interest when two users are

Page 4: CAF: Community aware framework for large scale mobile opportunistic networks

A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190 183

sharing k common topic, (ii) facebook graph, and (iii) social profile(union of nationality, language, and city).

3.1.6. CoNext08 [24]This experiment was performed at CoNEXT 2008 conference

using smartphones with the MobiClique application installed.The main difference with CoNext07 experiment is in the parame-terization: we had 22 participants and the neighborhood discoverywas randomized to be executed at intervals of 120� 45 s. In addi-tion, the social profile of MobiClique was initialized based on theuser’s Facebook profile.

The initial list of interests contained user-selected Facebookgroups and networks from his profile. As in the CoNext07 experi-ment, the social network evolved throughout the experiment asusers could make new friends and discover or create new groups(i.e., interest topics) and leave others. For the analysis we considerthe collected contact trace and the final social graph of 22 devices(the rest of the devices were not collecting data on each day of theexperiment).

To summarize, the resulting data set contains 3 sub-communi-ties in three different areas of San Francisco. Taxis are moving be-tween these areas. Contacts between taxis and nodes within anarea are added based on the same contact time and inter-contacttime distribution [5] of the corresponding area.

3.2. Evaluation methodology

In mobile opportunistic networks, we are generally interestedin delivering data among a set of N mobile wireless nodes. Commu-nication between two nodes is established when they are withinradio range of each other. Data is forwarded from source to desti-nation using these opportunistic contacts. We model the evolutionof contacts in the network by a time varying graph GðtÞ ¼ ðV ; EðtÞÞwith N ¼ jV j. We assume that the network starts at time t0 andends at time T (T can be infinite). We call this temporal network[15] the contact graph. Each GðtÞ describes the contacts betweennodes existing at time t. Such a time-varying graph model can beobtained from a mobility/contact trace or from a mobility modelalong with knowledge of radio properties (e.g., radio range).

We evaluate the performance of forwarding algorithms relyingon the three previously described data sets; Hope08, Dartmouth01,and SanFrancisco11. In our evaluation, we compute the sequenceof optimal paths found between any source and destination inthe data set. From the sequence of delay-optimal paths we deducethe delay obtained by the optimal path at all time. We uniformlycombine all the observations of a trace among all sources, destina-tions, and for every starting time (the time in seconds when themessage M was generated by the source node S). We assume thatcontacts between any two nodes are long enough to successfullytransfer a message. We present this aggregated sample of observa-tions via its empirical CDF. The detailed computation process couldbe found in [5]. Compared with previous generalized Dijkstra’salgorithm, this algorithm directly computes representation ofpaths for all starting times.

We utilize the following metrics to evaluate a given forwardingalgorithm f: (i) the normalized success rate within time t: the prob-ability of f to successfully find a path within time t, when sources,destinations, and message generation times are uniformly chosenat random. If no path exists, we include an infinite value in the dis-tribution. We then normalize by the CDF given by an epidemic for-warding algorithm, and (ii) the normalized cost: the fraction ofcontacts (i.e., number of replica copies) used by f normalized bythe fraction of contacts used by epidemic forwarding algorithm(the most expensive).

In mobile opportunistic networks, all forwarding algorithmsperform the same for very large delays; because when time in-

creases, the probability that the source node physically meet thedestination increases. In our evaluation, we are more interestedin delivery success rate improvement in short delays (10 min to1 h). We believe that if there is an improvement in success rate/cost this should be reflected in small time scales.

4. Rank-based forwarding overview

In this section, we illustrate the importance of ‘‘popular’’ nodesin the forwarding process for opportunistic networks. We also dis-cuss and highlight the difference between various rank-based for-warding algorithms. These algorithms estimate a node objectivesocial metric in order to guide forwarding towards the most popu-lar nodes [6,22], or towards a given destination [8,16]. They repre-sent, nowadays, one of the most promising methods for messageforwarding in opportunistic networks.

4.1. Importance of popular nodes in opportunistic forwarding

The ‘‘popularity’’ of a device may significantly vary according todifferent factors such as the popularity of the user carrying the de-vice, user mobility patterns, user location, or device characteristicssuch link capacity, CPU utilization, etc. It has been shown in a studythat uses real mobility trace analysis, that nodes have different lev-els of popularity, and the most frequently used nodes from a givensource are popular almost uniformly among all destinations [18].

We consider the Infocom06 data set as a graph where each edgerepresents a contact and includes a time value. We computed off-line, for all departure times, the set of delay-optimal paths in thisnetwork, where a path may use either a single edge, that is a directcontact, or a sequence of contacts in a time-respecting manner. Werefer to all delay-optimal paths starting from a source i, to all pos-sible destinations, as all ði; �Þ paths. More generally, we define theset of paths ði; �Þ; ð�; jÞ; ð�; �Þ, and ði; jÞ. The popularity of a node istherefore measured by the occurrence of a node in all shortestpaths defined by one of these sets; to measure the occurrence ofnode u, we compute shortest paths between any source and desti-nation pairs, and measure the number of those paths that passthrough node u.

We show in Fig. 1 that the 12 most popular relays (out of N = 41nodes) account for half of the total occurrences. For example, thetop node was used more than 34200 times to construct optimalpaths between different source–destination pairs. The top 4 nodesare responsible for more than 22% of the total occurrences found indelay-optimal paths for this data set as shown in Fig. 1. representrelevant patterns for efficient paths in this data set. These popularlinks are generally connecting two socially connected people(strong tie relationship).

These results (and more in [18]) indicate that selecting prefer-ential relays or looking for relevant patterns to construct onlinepaths may be conducted using this metric as an estimate. It isimportant nevertheless to show the impact of these popular nodeson the network performance.

4.2. Rank-based forwarding algorithms

Rank-based forwarding algorithms represent one of the mostpromising methods for message forwarding in opportunistic net-works. These techniques are store-carry-forward algorithms whichspread a message among nodes (relays) following a non-decreasingranking rule. They differ in the type of information used, as well ashow it is used, in order to rank nodes in the network. We distin-guish between two types of rank-based forwarding techniques;(1) social-based ranking techniques, where social interactions be-tween users are used to rank the nodes, and (2) contact-based rank-

Page 5: CAF: Community aware framework for large scale mobile opportunistic networks

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

30

25

20

15

10

5

Frac

tion

of o

ccur

renc

e

Occ

urre

nce

(*10

00)

Relays

Cumulative distributionNot cumulative distribution

Fig. 1. Popular relays in the Infocom06 data set.

184 A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190

ing techniques, where information learned during contact, are usedto rank nodes.

4.2.1. Social-based ranking algorithmsIn this paper, our analysis is based on three existing social for-

warding algorithms; Degree-Based Forwarding [19], Centrality-Based Forwarding (Simbet) [6], and PeopleRank [22]. Various socialforwarding mechanisms differ by the method used to predict de-vice mobility and future contact opportunities between devicesbased on simple social properties.

� Degree-Based Forwarding: consists of forwarding messages tosocially well connected nodes. Paths are then constructedaccording to a non-decreasing social node degree rule [19].� Centrality-Based Forwarding: the main idea behind this tech-

nique is that central nodes in social graphs are more likely tosocialize with other people and therefore more likely to forwardmessages [6].� PeopleRank: this is a distributed algorithm which ranks nodes

in the social graph similar to what PageRank [2] does for webpages – i.e., it measures the relative ‘‘importance’’ of a node inthe social graph [22].

Fig. 2. Example showing (a) the weaknesses of existing rank-based forwarding algorithmlarge scale one, and using MultiHomed nodes to disseminate messages to these sub-com

These approaches, which use social properties to forward mes-sages, implicitly assume that opportunistic contacts correlate withthe social property upon which the algorithms are designed. Inlarge scale networks, where distances have larger impact than con-tact opportunities, it is hard to defend such an assumption.

4.2.2. Contact-based ranking algorithmsIn the following, we present two of the most well-known con-

tact-based ranking algorithms in the literature. As the name indi-cates, they use locally available contact information to ranknodes and decide whether to forward a message when two nodesmeet. These two algorithms are:

� FRESH Total: node i forwards messages to node j if j hascontacted any other node more recently than node i.

� Greedy Total: node i forwards the message to node j if j hasmore total contacts than node i [8].

For simplicity we will refer, in the rest of this paper, to FRESHTotal and Greedy Total as FRESH-T and Greedy-T forwardingalgorithms.

5. Forwarding drawbacks in large scale opportunistic networks

5.1. Motivating scenario

It has been shown that rank-based forwarding techniques areone of the most promising methods that provide a high successrate and reasonably low cost [19,11,21]. However, these studies fo-cus on small data sets, and a small number of nodes. We believethat such techniques present serious limitations in large scaleopportunistic networks.

Fig. 2(a) illustrates a scenario where rank-based forwarding at-tempts to disseminate a message M generated by a source S to adestination D. Without any notion of communities, M is forwardedin the wrong direction relying on ‘‘globally popular nodes’’ in thenetwork (i.e., the BubbleRap technique [12]). These nodes, althoughpopular, may not be able to deliver the message to all the nodes inthe network. In Education City (EC), a campus that includes 6 USuniversity branches in Doha, Qatar, the founder of EC could be avery popular person in the whole campus (Globally Popular Node),

s in a large scale community, and (b) identifying sub-communities within the samemunities.

Page 6: CAF: Community aware framework for large scale mobile opportunistic networks

Fig. 3. Scalability issues of rank-based algorithm relying on experimental data sets.

3 http://www.dartmouth.edu/maps/campus/close-ups/index.html.

A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190 185

but not likely suitable to relay the message M to a student in a par-ticular university on campus. However, other nodes may be locallypopular (e.g., within a university of the campus as shown inFig. 2(b)) and more suitable to deliver this message to its destina-tion in a specific sub-community. Therefore, particular nodes whichwe call MultiHomed nodes, such as postmen or campus bus drivers,are more suitable to disseminate the message M across sub-com-munities. This approach is opposed to the BubbleRap algorithm thatuses globally popular nodes to disseminate the message to all com-munities. Therefore, the main idea (shown in Fig. 2(b)) is to firstbreakdown the original large scale community into multiple sub-communities, then disseminate the message to these sub-commu-nities. Afterwards, locally popular nodes can then deliver the mes-sage M within its sub-community.

5.2. Forwarding weakness in large scale networks

Fig. 3(a) plots the normalized success rate of the five rank-based for-warding algorithms: PeopleRank, Degree-Based, Simbet, FRESH-T, andGreedy-T with respect to the Dartmouth01 data set.

Despite the fact that node popularity matches the contact prop-erties of nodes, there are 25% to 55% of losses compared to Epi-demic forwarding, within a 10-min timescales. In fact, in largescale networks, rank-based forwarding algorithms loose manyopportunities to reach destinations in optimal delays. Similar re-sults are observed using the San Francisco data set. Fig. 3(b) showsthat the five considered rank-based forwarding algorithms achieveonly 25% to 65% of the success rate of the epidemic forwardingalgorithm. At this point, we can clearly see how rank-based for-warding suffers in large scale networks. Similar results are shownfor the Hope08 data set in Fig. 3(c) where the considered rank-based algorithms fail in forwarding messages to the correspondingdestinations; e.g., Greedy algorithm achieves only 25% success ratewith a 10 min timescale (i.e., in 75% of the cases Greedy fails toreach the destination).

These preliminary results match the intuition presented in theEC example shown in Fig. 2 and strongly motivate the need foragile solutions that take such situations into account.

6. Forwarding within sub-communities

So far, we have shown that using node popularity in large-scaleareas has serious drawbacks. Rank-based forwarding techniques,while showing promising results in small communities, presentserious limitations in large-scale networks. Our main hypothesisis that in large-scale networks, where multiple sub-communitiesmay exist, social and contact prediction has its limitations.

In this section, we introduce and compare the impact of differ-ent large-scale community classification techniques, and ensurethat the state-of-the-art rank-based forwarding algorithms per-

form well within the resulting sub-communities. In the next sec-tion, we introduce CAF, a community aware framework, that canbe easily integrated with these algorithms to improve their perfor-mance in large scale networks.

6.1. Classification and forwarding in sub-communities

A common property of networks is cliques or communities; cir-cles of friends or acquaintances in which every member knowsevery other member. In large-scale networks, people can be re-grouped into sub-communities. Our experimental data sets can beclassified into multiple communities according to different classifi-cation techniques. The SanFrancisco11 data set is by default classi-fied into three communities; airport, downtown, and sunset areas(geographic classification). We propose two community classifica-tion techniques for the Dartmouth01, and the Hope08 data sets:

6.1.1. Activity-based classificationThe Dartmouth College campus has over 160 buildings. Usually

people visiting the campus are interested in a few buildings. Peoplecan be classified based on their activity or interest. The intuitionbehind this classification is that people who frequently visit ath-letic buildings are more suitable to meet each others and socializeduring their athletic activities. The campus contains more than adozen athletic facilities and fields. We note that in the Dartmouthcampus, activity buildings are not co-located.

We compute the contact duration between a particular deviceand any athletic building access point. If this duration is more than3 h a day, we consider the student carrying the device an athleticstudent. Similarly, we define academic and residential communities.

6.1.2. Geographic classificationIn mobile opportunistic communication, physical proximity is

fundamental for message dissemination. We propose to group peo-ple living in close proximity into a single geographic community.Such classification could be done using Facebook or others onlinesocial applications where users explicitly specify their current cityor neighborhood. Users can also be tagged with few favorite loca-tions if they use online position-aware services. Using Hope08 andDartmouth01 data sets, we classify users within physical proximityinto a single sub-community.

The Dartmouth campus area is roughly 1300 � 1300 square me-ters, people going to campus every day are mostly visiting thesame places. Usually, these places are selected in a way that min-imizes the walking distance. To capture this classification, we splitthe Dartmouth campus into regions (Northwest NW, Northeast NE,Southwest SW, and Southeast SE).3 A node i belongs to a region R if

Page 7: CAF: Community aware framework for large scale mobile opportunistic networks

186 A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190

it has been connected to more access points belonging to this corre-sponding region as compared to the other regions.

The Hope08 dataset can also be grouped according to the geo-graphic classification; during the conference, only people in themezzanine or the 18th floor were tracked. We therefore considertwo geographic communities; mezzanine and 18th.

Fig. 5. Normalized success rate distribution of Greedy relying on differentcommunity classification (Dartmouth01).

6.1.3. Combination classificationThe combination classification is defined as an activity classifi-

cation within a geographical area. For example, in the Dartmouthcampus, given the three different building types and 4 differentgeographical areas, the combination will give us a maximum 3⁄4different sub-communities. For example the athletic buildings aremostly concentrated in only two geographical areas in the Dart-mouth campus.

6.2. Impact of different community classifications on forwardingperformance

After classifying the large-scale community, we now showrank-based forwarding algorithms performance within a singlesub-community. We plot the normalized success rate of tworank-based forwarding algorithms, PeopleRank, and Greedy Total(Greedy-T), according to the two community classifications de-scribed above. Similar results have been obtained with otherrank-based forwarding algorithms.

From Fig. 4 We generally observe that geographic classificationleads to better PeopleRank performance. PeopleRank achieves 92%to 97% of the normalized success rate within 10-min timescalesaccording to the geographic classification in Fig. 4(b), and 90% to94% within the same timescale according to the activity based clas-sification in Fig. 4(a). These results confirm that short distances(e.g., people living in the same neighborhood or region) typicallylead to strong social ties, and relevant social classification.

Moreover, we notice that, in Fig. 4(a), PeopleRank achieveshigher success rates among athletic users than others accordingto the activity based classification. Relying on the athletic commu-nity, PeopleRank outperforms its own success rate performance byroughly 3% and 5% within 10-min timescale compared to the aca-demic and the residential communities respectively. As describedabove, most of the athletic buildings are located in the southeastcorner of the campus which leads to a combination of geographicand activity based classification. While PeopleRank achieves poorperformance across multiple communities, within a single sub-community, it performs fairly well, and achieves more than 80%success rate within 10 min. Users within a single sub-communityare able to communicate efficiently using a rank-based forwardingalgorithm such as PeopleRank.

Fig. 4. Normalized success rate distribution of PeopleRank relying on differentcommunity classification (Dartmouth01).

We also investigate the performance of Greedy algorithm with-in Dartmouth01 sub-communities. We plot the performance ofGreedy according to both geographic (Fig. 5(a)), and activity-based(Fig. 5(b)) classifications. FRESH-T forwarding algorithm achievesroughly 80% success rate within most sub-Dartmouth01 communi-ties. We also obtain similar results, not shown in this paper, usingSimbet and Degree-Based Forwarding algorithms.

We note that this paper does not investigate new classificationtechniques; we assume that people can be classified based on theirsocial profile information [27], location, IP addresses, or phonenumbers. Our goal is to confirm that rank-based forwarding algo-rithms achieve satisfactory performance within sub-communitiesregardless of the classification methods. However, it was shownin the previous section that they suffer in large scale networkswhere multiple sub-communities may exist. We therefore proposea framework to help existing rank-based forwarding algorithmsdeal with this issue and successfully forward messages across mul-tiple sub-communities.

7. Forwarding across sub-communities

Motivated by the satisfactory performance of rank-based for-warding within single sub-communities, we propose a communityaware framework (CAF). This framework can easily be integratedwith most rank-based forwarding algorithms in order to deal withthe weaknesses described above in large scale networks.

7.1. The community aware framework (CAF)

The community aware framework (CAF) relies on the fact thatrank-based forwarding algorithms operate normally within thesame sub-community SC. Indeed, messages will be forwarded rely-ing on rank-based techniques toward nodes which belong to thesame sub-community.

On the other hand, particular nodes will operate as an inter-com-munity backbone and circulate messages to other sub-communitieswhere they will then be forwarded according to a rank-basedforwarding rule. We call these backbone-like nodes MultiHomednodes (MH). MultiHomed nodes are characterized by their highermobility and belong to multiple sub-communities (i.e., as they movefrom one sub-community to another). These nodes can be postmen,buses, cabs, etc. depending on the large community. We then rankthese MultiHomed nodes (MHrank) according to the number of sub-communities they belong to; i.e., SCðiÞ is equal to the number ofsub-communities node i belongs to. For example, if we considerthe geographic classification in the Dartmouth01 data set, MH nodesbelonging to four sub-communities are high ranked compared toMH nodes belonging to only two or three sub-communities. There-

Page 8: CAF: Community aware framework for large scale mobile opportunistic networks

Fig. 6. Impact of community classification on CAF-PeopleRank success rate(Dartmouth01 data set).

A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190 187

fore, MH nodes carrying a message, forward it to other MH nodesaccording to a non-decreasing MHrank.

Algorithm 1. CAF-RBFA (node i)

Require: Node i is running a Rank-Based ForwardingAlgorithm RBFARBFAðiÞ denotes the rank of node i according to RBFA

Ensure: MHirank SCðiÞ

1: while (1) do2: while (i is in contact with j) do3: updateðRBFAðiÞ;RBFAðjÞÞ4: while (9m 2 bufferðiÞ) do5: if

½SCðiÞ ¼¼ SCðjÞ & RBFAðjÞP RBFAðiÞ�OR ½j ¼ destinationðmÞ�OR ½MHj

rank P MHirank� then

6: Forwardðm; jÞ7: end if8: end while9: end while10: end while

Algorithm 1 summarizes the additional operations on top of thecurrent state-of-the-art rank-based forwarding algorithm whichwe refer to by RBFA in Algorithm 1. Besides the simplicity of ourproposed algorithm, we emphasize that the overhead is relatively

Fig. 7. Comparison of CAF-PeopleRank, CAF-Degree-based, and CAF-Sim

negligible compared to the overhead induced by BubbleRap tocompute the global node ranking in the whole system. Our pro-posed algorithm remains a distributed forwarding algorithm andrelies on local social/contact information to estimate future trans-fer opportunities. Next, we evaluate the CAF-extended version ofthree state-of-the-art rank-based forwarding algorithms that inte-grate our proposed framework.

7.2. CAF Evaluation

In this section, we apply CAF to PeopleRank, Simbet, Greedy-T,FRESH-T, and Degree-Based Forwarding. We conduct our evalua-tion using real trace analysis where we compare CAF-extendedrank-based forwarding algorithms to BubbleRap [12].

7.2.1. The impact of community classification on CAF-enabled rank-based forwarding algorithms

We investigate the impact of different community classification(described in the previous section) on the performance of CAF. Weshow a representative set of results for the CAF-PeopleRank algo-rithm. Similar results have also been observed for CAF-Simbetand CAF-Degree Based but are not shown to avoid redundancy.

Fig. 6 plots the normalized success rate of the extended People-Rank algorithm, with different classification techniques, in theDartmouth01 data set. We notice that extended PeopleRank out-performs the original PeopleRank for all timescales (5% to 30% ofsuccess rate improvement). Furthermore, such improvement dif-fers from one community classification to another. Geographicclassification gives better performance than activity-based classifi-cation; indeed, the activity-based classification in the Dartmouthcampus groups people belonging to specific buildings. However,these buildings are not always geographically close to each others,and therefore messages sent from a specific building can take along time to reach other members of the same sub-community.We finally note that, combining two community classification ap-proaches leads to better success rates with more than a 30%improvement compared to the Activity-Based classification.

7.2.2. CAF vs. BubbleRapInitially, we consider the SanFrancisco11 data set using only 5%

of the total cabs to represent the MultiHomed nodes. We evaluatethe performance of our proposed framework (CAF) and compareagainst: (i) the original rank-based forwarding scheme (withoutCAF), and (ii) the BubbleRap algorithm. Later, we analyze the im-pact of different values of cabs fraction on the performance andjustify our 5% choice in the evaluation.

We apply CAF to state-of-the-art social-based forwarding algo-rithms: PeopleRank, Simbet, and Degree-Based Forwarding. Fig. 7compares the performance of the extended versions of these three

bet with BubbleRap (SanFrancisco11 data set using only 5% cabs).

Page 9: CAF: Community aware framework for large scale mobile opportunistic networks

188 A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190

algorithms against the original versions (without CAF) and theBubbleRap algorithm using the SanFrancisco11 data set. We firstshow that in the three plots, the CAF extended algorithms outper-form the corresponding original algorithms for all timescales; for a10-min timescale CAF-PeopleRank outperforms PeopleRank byroughly 40% more delivery success rate, while CAF-Simbet andCAF-Degree-Based achieve respectively 30% and 25% better successrates compared to their original algorithms. For larger timescalesCAF performance remains superior, however the improvement isless significant since the proposed framework is designed for a bet-ter dissemination of the message in order to reach the destinationin shorter delays.

Moreover, we show that CAF outperforms BubbleRap especiallyfor PeopleRank and Simbet; the probability to successfully deliverthe message using CAF-PeopleRank or CAF-Simbet is 5% to 30% lar-ger than the success rate probability achieved by BubbleRap for alltimescales. The reason behind this result is that BubbleRap usesnode degree to estimate social centrality. However, it was shownthat node degree cannot be considered to efficiently estimate fu-ture contacts [19]. Therefore, BubbleRap performs poorly com-pared to Centrality-Based forwarding algorithms (CAF-Simbetand PeopleRank) and performs better compared to CAF-Degree-Based only in very large timescales.

We also apply CAF to state-of-the-art contact-based forwardingalgorithms: Greedy-T, and FRESH-T. Fig. 8 shows the performance

Fig. 8. Comparison of CAF-FRESH-T and CAF-Greedy-T with BubbleRap (SanFran-cisco11 data set using only 5% cabs).

improvement introduced by CAF compared to the original perfor-mance given by each of these algorithms. CAF applied to FRESH-Tand Greedy-T achieves from 30% to 45% success rate improvementcompared to the original FRESH-T (34% success rate) and Greedy-T(43% success rate) algorithms.

We finally compare CAF-FRESH-T and CAF-Greedy-T to Bubble-Rap. CAF achieves slightly better performance than BubbleRap; forinstance CAF-Greedy achieves 8% better success rate compared toBubbleRap. BubbleRap uses contact properties to compute socialcommunities. However, CAF uses explicit social data to infer sub-communities and forward accordingly. It has been shown in[11,21] that when social data does not present a strong correlationwith the mobility data, this translates to social forwarding perfor-mance degradation. This may explain the better performance givenby this algorithm in large timescales compared to CAF-Degree-Based algorithm. However, in shorter time scales CAF-Degree-Based achieves 7% more success delivery rate than BubbleRap. Thiscan be explained by the use of explicit MultiHomed nodes insteadof globally popular nodes as shown in the motivating example(Fig. 2).

7.2.3. The impact of multihomed nodesThe number of MultiHomed nodes (MH) needed to make CAF

efficient may depend on different factors such as the number ofcommunities, distance between communities, MH mobility, etc.Obviously, the more MultiHomed (MH) available for the system,the more successful it will be. Our goal in this section is to under-stand the impact of the number of MH nodes on the system’s per-formance. We use the SanFrancisco11 data set and vary thefraction of cabs used in the trace (we randomly pick x% of the totalnumber of cabs); cabs in this data set are considered by definitionas MH nodes (as described in Algorithm 1) since they connect thethree disconnected areas of the San Francisco city. Cabs are there-fore the only nodes in our data set that belong to multiple sub-communities.

In Fig. 9, we plot the normalized success rate of the CAF-People-Rank algorithm with different fractions of MultiHomed nodes inthe SanFrancisco11 data set. Obviously the more MH nodes usedthe better performance CAF can achieve. We show that theimprovement is significant for the first MH added; the improve-ment from 1% to 2% of MH is roughly 10% of success rate howeverit is only 0.7% from 5% to 10% of MH nodes. We show that with only5% of MultiHomed nodes (only 10 cabs in the data set) CAF-People-Rank algorithm achieves a near to optimal performance; it per-forms more than 90% of success rate compared to epidemic

Fig. 9. Normalized success rate distribution of CAF-PeopleRank across multiplecommunities (SanFrancisco11).

Page 10: CAF: Community aware framework for large scale mobile opportunistic networks

A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190 189

forwarding (within 10-min timescale) which represent only 0.7%less than the optimal given by 10% of MH nodes (20% of MH givesno significant improvement compared to 10% of MH). These resultsare very promising since they do not require a large number of par-ticipants to be involved in order for CAF to be successful, andhence, minimizing the deployment barrier for such solutions.

An important observation from the figure to share is that addingMultiHomed nodes such as taxis in the SanFrancisco11 data set, isalso beneficial for shorter delays. This might appear counter intui-tive since one might expect that taxis may need non negligibletime to drive from one community to another, and so, only largetime delays would be affected. However, this effect is explainedby the fact that taxis are also used to improve the performancewithin a single sub-community; e.g., within downtown, taxis couldbe considered as a better relay to efficiently disseminate the mes-sage within such sub-community.

Fig. 10 plots a comparison of CAF-Simbet, CAF-BubbleRap, andCAF-PeopleRank success rate performance using 2% and 5% Mul-tiHomed nodes. We show that CAF-PeopleRank outperforms allother schemes while using only 2% compared to 5% MultiHomednodes used by the other scheme. This result was verified for shorttimescale where CAF-PeopleRank was able to identify and dissem-inate the messages to the more suitable relays to reach moredestinations.

7.3. The Cost of CAF

The cost of a forwarding algorithm, defined as the fraction ofcontacts involved in the forwarding process, is very important inopportunistic networks. It is obvious that CAF uses more contactsthan the original version since it disseminates the message firstto all other sub-communities, and then proceeds normally withineach sub-community. We therefore quantitatively compare thecost of each rank-based forwarding algorithm and study more op-tions in order to reduce the cost of the extended version of rank-based forwarding algorithms.

In addition to the basic CAF we have presented, we propose andevaluate the cost of a community destination aware framework(CDAF). Such framework assumes a priori knowledge of the desti-nation’s sub-community. Such simplistic assumption is widelyused in the literature [17]. We assume that the source node willbe able to search, retrieve, and append the destination’s commu-nity to the message; e.g., it can use an IPv6 address or simply ap-pend the city where the destination lives (e.g., fetched fromFacebook) to the original message. Knowing the destination’s com-

Fig. 10. CAF-Simbet vs. CAF-BubbleRap vs. CAF-PeopleRank (SanFrancisco11).

munity Cm, CDAF will then be able to forward the message m to theMultiHomed nodes that belong to Cm. As a result, CDAF will narrowdown the set of nodes able to receive the message m by focusingmore on the destination’s area, and then the overall overhead gen-erated by the forwarding process.

Algorithm 2 summarizes the additional operations on top of theCAF-RBFA forwarding algorithm running at a given node i. We as-sume that node i knows the list of sub-communities it belongs to.Nodes i MHi

rank will be simply the number of communities it be-longs to. Algorithm 2 will then operate similarly to Algorithm 1with one additional condition at line 5 where CDAFðiÞ checks ifthe encountered node’s MH belongs to the message’s destinationsub-communities (Cm) or not.

We use this framework as a benchmark and compare the over-head of CDAF, CAF, to the overhead of the original rank-based for-warding algorithms.

Algorithm 2. CDAF (node i)

Require: Node i is running CAF-RBFAðiÞEnsure ClistðiÞ fCjji 2 Cjg

MHirank kClistðiÞk

1: while (1) do2: while (i is in contact with j) do3: updateðCAF-RBFAðiÞ;CAF-RBFAðjÞÞ4: while (9mðD 2 CmÞ 2 bufferðiÞ) do5: if (Cm 2 ClistðjÞ) then6: if

½ðClistðiÞ \ ClistðjÞÞ– ; ANDðCAFðjÞP CAFðiÞÞ�OR ½j ¼ destinationðmÞ�

OR ½MHjrank P MHi

rank� then7: Forwardðm; jÞ8: end if9: end if10: end while11: end while12: end while

We measure the normalized cost of each algorithm x defined asthe fraction of contacts used by x normalized by the fraction ofcontacts used by epidemic forwarding algorithm [29]. Fig. 11 com-pares the normalized cost of the different schemes using (a) theSanFrancisco11 data set, and (b) the Hope08 data set. The forward-ing process of our CAF framework disseminates additional replicas

Fig. 11. Normalized cost of different social forwarding schemes vs. BubbleRap.

Page 11: CAF: Community aware framework for large scale mobile opportunistic networks

Fig. 12. Normalized cost of contact-based forwarding schemes (Greedy-T, andFRESH-T) vs. BubbleRap.

190 A. Mtibaa, K.A. Harras / Computer Communications 36 (2013) 180–190

of the same message to reach the destination in shorter delays; CAFuses 2% to 10% more replicas as compared to the original forward-ing. However, while incurring such marginal increase in cost, CAFshows a success delivery rate increase of around 10% to 45% com-pared to state-of-the-art rank-based forwarding algorithms.

Moreover, CDAF outperforms all other schemes, and its cost isreduced considerably by roughly 10% to 25%; in the SanFrancis-co11 data set (i.e., Fig. 11(a)), CDAF-PeopleRank reduces the costof CAF by roughly 34% and uses only 32% of the total of contactsin its forwarding process.

We also show that CAF is almost as costly as BubbleRap anduses slightly fewer contacts than BubbleRap in PeopleRank andSimbet cases (cost decreases by 2% to 5%). BubbleRap uses 4% lesscontacts compared to CAF-Degree-Based. However, we would liketo emphasize the fact that the cost metric does not include theoverhead generated by the K-CLIQUE community detection algo-rithm used by BubbleRap [12]. This partly explains the slightly bet-ter performance of BubbleRap compared to CAF-Degree-Basedalgorithm.

We also investigate the cost of our proposed framework CAFand CDAF when applied to contact-based forwarding algorithms.Fig. 12 compares the normalized cost of CAF, CDAF, BubbleRapand the original algorithms FRESH-T, and Greedy-T using two datasets SanFrancisco11, and Hope08. Obviously, contact-based algo-rithms are more costly than social-based algorithms. They incura larger delivery delay which increases the number of messages’replicas in the network. We also show that CDAF decreases the costconsiderably by 10% to 20%.

8. Conclusion

The proliferation of online social network platforms and appli-cations such as Facebook, Orkut, or MySpace, causes informationabout the social interaction of users to be easily accessible. In thepast years, researchers have shown that such information can thenbe used to predict future encounters of participating devices. Inthis paper, we have studied the weakness of state-of-the-art socialforwarding algorithms in large scale networks. In such networkswhere multiple sub-communities may exist, social prediction hasits limitations. We have proposed CAF, a community aware frame-work. To summarize our findings, social information can be consid-ered to guide and improve forwarding decisions within a sub-community. Within multiple sub-communities, CAF helps existingsocial forwarding algorithms to improve their performance byroughly 40%, and achieves 5% to 30% better success delivery ratethan BubbleRap, with negligible incurred cost no more than 10%.

There are several venues for future work. First, in this papersub-communities were chosen offline. An important researchdirection to pursue is to indicate whether these sub-communitiescan be efficiently identified using a distributed algorithm runningwith local information at the nodes. Second, we have shown thatthe cost is considerably reduced if we assume a priori knowledgeof the destination’s sub-community. Better distributed approachesto predict this information would be highly beneficial.

References

[1] A. Balasubramanian, B. Levine, A. Venkataramani, Dtn routing as a resourceallocation problem, SIGCOMM Comput. Commun. Rev. 37 (4) (2007) 373–384.

[2] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine,Comput. Netw. ISDN Syst. 30 (1–7) (1998) 107–117.

[3] J. Burgess, B. Gallagher, D. Jensen, B. Levine, Maxprop: routing for vehicle-based disruption-tolerant networking, in Proceedings of IEEE INFOCOM, 2006.

[4] B. Burns, O. Brock, B.N. Levine, Mv routing and capacity building in disruptiontolerant networks, in: Proceedings IEEE INFOCOM, 2005, pp. 398–408.

[5] A. Chaintreau, A. Mtibaa, L. Massoulié, C. Diot, The diameter of opportunisticmobile networks, in: Proceedings of ACM CoNext, 2007.

[6] E.M. Daly, M. Haahr, Social network analysis for routing in disconnected delay-tolerant manets, in: MobiHoc ’07: Proceedings of the 8th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing, ACM, New York,NY, USA, 2007.

[7] H. Dubois-Ferriere, M. Grossglauser, M. Vetterli, Age matters: efficient routediscovery in mobile ad hoc networks using encounter ages, in: MobiHoc ’03:Proceedings of the 4th ACM International Symposium on Mobile Ad HocNetworking & Computing, 2003, pp. 257–266.

[8] V. Erramilli, A. Chaintreau, M. Crovella, C. Diot, Diversity of forwarding paths inpocket switched networks, in: IMC ’07: Proceedings of the 7th ACM SIGCOMMConference on Internet Measurement, ACM, New York, NY, USA, 2007.

[9] V. Erramilli, M. Crovella, A. Chaintreau, C. Diot, Delegation forwarding, in: MobiHoc’08: Proceedings of the 9th ACM International Symposium on Mobile Ad HocNetworking and Computing, ACM, New York, NY, USA, 2008, pp. 251–260.

[10] T. Henderson, D. Kotz, I. Abyzov, The changing usage of a mature campus-widewireless network, in: Proceedings of ACM MobiCom, 2004.

[11] T. Hossmann, T. Spyropoulos, F. Legendre, Know thy neighbor: towardsoptimal mapping of contacts to social graphs for dtn routing, in: Proceedingsof IEEE INFOCOM, IEEE, 2010.

[12] P. Hui, J. Crowcroft, E. Yoneki, Bubble rap: social-based forwarding in delaytolerant networks, IEEE Trans. Mob. Comput., 99 (PrePrints), 2010.

[13] S. Jain, K. Fall, R. Patra, Routing in a delay tolerant network, in: Proceedings ofACM SIGCOMM, 2004.

[14] D. Johnson, A. Lysko, Comparison of manet routing protocols using a scaledindoor wireless grid, Mob. Netw. Appl. 13 (1-2) (2008) 82–96.

[15] D. Kempe, J. Kleinberg, A, Kumar. Connectivity and inference problems fortemporal networks, in: Proceedings of ACM STOC, 2000.

[16] A. Lindgren, A. Doria, O. Schelén, Probabilistic routing in intermittentlyconnected networks, SIGMOBILE Mob. Comput. Commun. Rev. 7 (3) (2003)19–20.

[17] M. Mauve, A. Widmer, H. Hartenstein, A survey on position-based routing inmobile ad hoc networks, IEEE Netw. 15 (6) (2001) 30–39.

[18] A. Mtibaa, A. Chaintreau, C. Diot, Popularity of nodes in pocket-switchednetworks, in: Proceedings of ACM SIGCOMM (Extended abstract), 2007.

[19] A. Mtibaa, A. Chaintreau, J. LeBrun, E. Oliver, A.-K. Pietilainen, C. Diot, Are you movedby your social network application?, in: WOSN’08: Proceedings of the FirstWorkshop on Online Social Networks, ACM, New York, NY, USA, 2008, pp 67–72.

[20] A. Mtibaa, K. Harras, Social forwarding in large scale networks: insights basedon real trace analysis, in: Proceedings of IEEE ICCCN, 2011.

[21] A. Mtibaa, M. May, M. Ammar, On the relevance of social information toopportunistic forwarding, in: International Symposium on Modeling, Analysis,and Simulation of Computer Systems, 2010, pp. 141–150.

[22] A. Mtibaa, M. May, M. Ammar, C. Diot, Peoplerank: social opportunisticforwarding, in: Proceedings of IEEE INFOCOM (mini conference), IEEE, 2010.

[23] D. Nain, N. Petigara, H. Balakrishnan, Integrated routing and storage formessaging applications in mobile ad hoc networks, in: WiOpt ’03: Modelingand Optimization in Mobile, Ad Hoc and Wireless Networks, March 2003.

[24] A.-K. Pietiläinen, E. Oliver, J. LeBrun, G. Varghese, C. Diot, Mobiclique:Middleware for mobile social networking, in: WOSN’09: Proceedings of ACMSIGCOMM Workshop on Online Social Networks, August 2009.

[25] M. Piorkowski, N. Sarafijanovic-Djukic, M. Grossglauser, CRAWDAD data set epfl/mobility (v. 2009-02-24). Avilable from <http://crawdad.cs.dartmouth.edu/epfl/mobility>, Feb. 2009.

[26] M. Piorkowski, N. Sarafijanovoc-Djukic, M. Grossglauser. A parsimoniousmodel of mobile partitioned networks with clustering, in: The FirstInternational Conference on COMmunication Systems and NETworkS(COMSNETS), January 2009.

[27] C. Tantipathananandh, T. Berger-Wolf, D. Kempe, A framework for communityidentification in dynamic social networks, in: KDD ’07: Proceedings of the 13thACM SIGKDD International Conference on Knowledge Discovery and DataMining, ACM, New York, NY, USA, 2007, pp. 717–726.

[28] M.M.B. Tariq, M. Ammar, E. Zegura, Message ferry route design for sparse adhoc networks with mobile nodes, in: MobiHoc ’06: Proceedings of the SeventhACM International Symposium on Mobile Ad Hoc Networking and Computing,ACM Press, New York, NY, USA, 2006, pp. 37–48.

[29] A. Vahdat, D. Becker, Epidemic routing for partially-connected ad hocnetworks, Technical Report CS-2000-06, UCSD, 2000.

[30] W. Zhao, M. Ammar, E. Zegura, A message ferrying approach for data deliveryin sparse mobile ad hoc networks, in: MobiHoc ’04: Proceedings of the 5thACM International Symposium on Mobile Ad Hoc Networking and Computing,ACM, New York, NY, USA, 2004, pp. 187–198.

[31] W. Zhao, M. Ammar, E. Zegura, Controlling the mobility of multiple datatransport ferries in a delay-tolerant network, in: IEEE INFOCOM, 2005.