hubcode: hub-based forwarding using network coding …salilk/papers/journals/wcmc2011b.pdf · the...

WIRELESS COMMUNICATIONS AND MOBILE COMPUTINGWirel. Commun. Mob. Comput. (2011)

Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/wcm.1143

RESEARCH ARTICLE

HUBCODE: hub-based forwarding using networkcoding in delay tolerant networks†

Shabbir Ahmed* and Salil S. Kanhere

School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia

ABSTRACT

Most people-centric delay tolerant networks have been shown to exhibit power-law behavior. Analysis of the temporal con-nectivity graph of such networks reveals the existence of hubs, a fraction of the nodes, which are collectively connected tothe rest of the nodes. In this paper, we propose a novel forwarding strategy called HubCode, which seeks to use the hubsas message relays. The hubs employ random linear network coding to encode multiple messages addressed to the samedestination, thus reducing the forwarding overheads. Further, the use of the hubs as relays ensures that most messages aredelivered to the destinations. Two versions of HubCode are presented, with each scheme exhibiting contrasting behavior interms of the computational costs and routing overheads. We formulate a mathematical model for message delivery delayand present a closed-form expression for the same. We validate our model and demonstrate the efficacy of our solutions incomparison with other forwarding schemes by simulating a large-scale vehicular delay tolerant network using empiricallycollected movement traces of a city-wide public transport network. Under pragmatic assumptions, which account for shortcontact durations between nodes, our schemes outperform comparable strategies by more than 20%. Copyright © 2011John Wiley & Sons, Ltd.

KEYWORDS

DTN; Routing; Network Coding; Power-law; Hubs; Forwarding

*Correspondence

Shabbir Ahmed, School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.E-mail: [email protected]

1. INTRODUCTION

Delay tolerant networks (DTN) are a type of challengednetworks, wherein the contacts between the communicat-ing devices are intermittent. Consequently, a contempora-neous end-to-end path between the source and destinationrarely exists. Of particular interest are the networks thatare formed by people in urban environments. These include(i) pocket switched networks [1], wherein personal commu-nication devices carried by humans self-organize to forman intermittently connected network, and (ii) vehicle-basedDTN [2], in which WiFi routers mounted on vehicles cancommunicate with each other.

Message forwarding is one of the most challeng-ing aspects of DTN because of the inherent intermit-tent connectivity [3]. However, knowledge of fundamental

†Parts of this work have been published in “HUBCODE: message for-

warding using hub-based network coding in delay tolerant networks”.

Proceedings of MSWiM’09, October 2009; 288–296.

properties of the underlying network can be helpful in mak-ing better forwarding decisions. In particular, the afore-mentioned people-centric networks have been shown tofollow power-law behavior [1,4,5]. In such networks, asmall percentage of the nodes, often referred to as hubs [6],are known to have significantly higher connectivity (i.e.,high node degree) as compared with the rest of the nodes.Consequently, most nodes can be reached from every othernode by a small number of hops, via the hubs.

A few forwarding schemes have been proposed, whichexploit these power-law properties [7,8]. The generalidea is to rank nodes based on popularity metrics suchas node centrality. A node then forwards a messageto another node, if the latter has a higher rank thanthe former. These schemes have been shown to per-form effectively, under the assumption that the nodes canexchange unlimited data in an encounter. However, inreality, contact durations between nodes in people-centricDTN are often only few seconds long [1]. Consequently,the most popular nodes, which concentrate all of theforwarding traffic, can often only exchange limited number

Copyright © 2011 John Wiley & Sons, Ltd.

HUBCODE: hub-based forwarding using network coding in DTN S. A. and S. S. Kanhere

of messages and in effect act as bottlenecks in the forward-ing process.

In this paper, we seek to address this particular problemby employing the theory of network coding [9], which hasbeen shown to attain maximum information flow in a net-work. We propose a novel forwarding strategy, HubCode,which exploits the power-law properties of the network bydirecting all forwarding traffic to the hubs. In other words,the hubs form a data conduit. Messages are then forwardedwithin the data conduit (i.e., only among hubs) using ran-dom linear network coding, wherein multiple messagesaddressed to the same destination are combined to forma single encoded message. Because randomly selectedcoefficients are used in the coding process, each encodedmessage is useful to the destination, thus reducing thepropagation of redundant messages.

In the basic version of HubCode, the hubs exchangethe coefficient matrices of the encoded messages priorto data exchange to select the messages for forwarding.The resulting overhead, which is O.n2/ for n messages,can be fairly significant. As a result, during short con-tact durations, the hubs may not get a chance to forwardcoded messages, because most of the contact opportu-nity is used for exchanging coefficients. To reduce thisoverhead, we propose an alternate approach, wherein thehubs do not exchange the entire coefficient matrices butrather only exchange a list of native messages. The result-ing overhead is just O.n/. However, the hubs now needto decode the messages (i.e., solve linear equations),which is computationally expensive. On the contrary, inthe basic version, only the destination decodes the mes-sages, thus simplifying the processing at the hubs. Thesetwo versions address the important trade-off betweenrouting overhead and computational complexity.

We evaluate the performance of our proposed schemesand compare them with other forwarding protocols usingtraces collected from a large-scale (>1000 nodes) real-world bus-based DTN. Under realistic assumptions, whichaccount for the limited data exchange possible duringshort encounters, our schemes achieve 20% higher deliveryratio than comparable strategies. In addition, our schemesachieve about 50% savings in delivery cost. Empirical eval-uations also suggest that our schemes are more resilientagainst random node failure compared with other proto-cols. Besides, comprehensive analyses have been carriedout to examine the effect of varying traffic loads, mes-sage lengths, and hub sizes on delivery performance ofHubCode.

We have also formulated a mathematical model to esti-mate the message delivery delay. Closed-form expressions,which serve as an upper bound in estimating messagedelivery delay, are presented for our proposed schemes.Simulation results corroborate our analytical formulationespecially when there are sufficient numbers of hubs thatact as message relays.

The rest of the paper is organized as follows: Section 2discusses related work. Section 3 presents the detailsof the HubCode schemes. The mathematical model to

estimate message delivery delay is described in Section 4.In Section 5, we present the results from our simulations,and finally, Section 6 concludes the paper. A preliminaryversion of this work has been presented in [10].

2. RELATED WORK

Ahlswede et al. [9] first introduced the theory of networkcoding and showed that it can achieve maximum infor-mation flow in a network, in the context of multicasting.In recent years, researchers have demonstrated that net-work coding can improve not only the performance ofwired networks [11] under link disruption but also thethroughput of wireless networks for unicast [12–14] aswell as broadcast transmissions [15]. Li and Li [14] pre-sented theoretical results on the application of networkcoding for unicast transmissions. The works presentedin [12] and [13] focus on practical issues. They demon-strate that network coding can benefit from leveragingthe broadcast advantage in wireless networks. In [12],the authors also presented empirical results from testbeddeployments and showed that their proposed method canincrease the throughput several folds. However, their meth-ods are suited for densely connected networks such asmesh networks, where the nodes can overhear their neigh-bors’ transmissions. Consequently, these schemes are noteffective for intermittently connected networks such asDTN.

A few papers [16,17] have studied the use of networkcoding in DTN. Zhang et al. [17] and Widmer and Boudec[16] have studied the benefits of using random linear cod-ing (RLC) for unicast transmissions in DTN. RLC usessimple flooding to distribute the messages in the network.However, rather than transmitting the native messages, anode combines these messages to form an encoded mes-sage and forwards this encoded message to its neighbors.The coefficients used in the encoding process are alsotransmitted along with the message. The messages areonly decoded at the destination, when it receives a suffi-cient number of encoded messages (n linearly independentencoded messages are required to decode nmessages). Ourproposed scheme also employs network coding for for-warding messages. However, there are two key differences.First, instead of flooding the encoded messages in the net-work, we leverage the power-law properties of the networkand only choose a small fraction of the nodes that have highconnectivity (i.e., hubs), as the relay nodes. Second, onlythe hubs are responsible for coding messages.

In recent years, several researchers [1,5,18] have ana-lyzed the properties of people-centric DTN using empir-ically collected traces. They have found that in all thesenetworks, a small percentage of popular nodes are con-nected to most of the other nodes. In other words, thedegree distribution follows a power law. Freeman [19]defined several centrality metrics to measure the impor-tance of a node in a network. Researchers in [7,8] haveproposed forwarding strategies that exploit the existence

Wirel. Commun. Mob. Comput. (2011) © 2011 John Wiley & Sons, Ltd.DOI: 10.1002/wcm

S. A. and S. S. Kanhere HUBCODE: hub-based forwarding using network coding in DTN

of the scale-free structure in the underlying network. InBubbleRap [7], nodes are formed into communities andalso ranked according to their centrality. Both global andcommunity rankings are used to find suitable forwardersby using a gradient forwarding approach. Similar ideasare proposed in [8], where each node is assigned a qual-ity metric based on its popularity. Gradient forwarding isthen employed. In our work, we also make use of the pop-ular nodes (called hubs) as relay nodes. However, unlikethese schemes, which employ gradient forwarding, in Hub-Code, messages are disseminated amongst the hubs usingnetwork coding.

Mathematical modeling of DTN forwarding strategiesis a mature field of research. In particular, forwardingschemes based on epidemic forwarding principles havebeen extensively analyzed in [20–23]. It is commonplaceto use differential equations for modeling system dynam-ics in DTN [23] because of their simplicity as comparedwith Markov chains. For example, the system dynamics offorwarding a single packet have been modeled using ordi-nary differential equations [23]. In [24], the authors haveextended this to account for a batch of packets for bothreplication-based and network-coding-based forwarding.There also exist some efforts that analyze the performanceof two-hop relay protocols using redundant copies [25,26].As mentioned earlier, these works are based on epidemicprinciple where a node distributes its messages to any othernodes it meets, treating all nodes equally. Consequently, itis difficult to adopt these techniques for modeling our pro-posed schemes, which differentiate between nodes basedon their encounter patterns (hubs versus normal nodes).Instead, in this paper, we use the network model proposedin [22] for studying routing in mobile ad hoc networks.In this model, the characteristics of ad hoc network arecaptured through a single parameter: the inter-contact ratebetween nodes. We have amended the model to incorpo-rate the effects of network coding and the forwarding poli-cies of our hub-based forwarding scheme in a simplifiedmanner.

3. HUBCODE

As highlighted in Section 1, empirical analysis of themobility patterns of several people-centric DTN [1,4,5]have revealed that the degree distribution of the networkgraph follows a power law. This implies the existence of asmall percentage of hubs, which are individually connectedto a large number of nodes as compared with other nodes.Further, collectively, the hubs are connected with most ofthe other nodes in the network (i.e., they achieve nearly100% coverage). Motivated by these properties, we pro-pose a novel forwarding strategy called HubCode, whichuses the hubs as message relays. The hubs are identifiedby analyzing historical movement patterns of the nodes(e.g., in this paper, we have identified the hubs based ontheir node degrees). Because most people-centric networksexhibit significant repeatability (e.g., most people have the

same daily routine; buses follow the same schedule), thisclassification of nodes is reasonably time invariant. Also,if the network characteristics change, the new set of hubscan be readily identified by repeating the analysis. An alter-native to this static hub-labeling approach is to enable eachnode to count the number of unique nodes it meets overtime. As time elapses, each node can quite effectively useits counter as its rank (i.e., degree) in the network. Thishub-labeling approach is de-centralized and do not dependon the history of nodes’ degree distribution. A more sophis-ticated approach that can be employed in our hub-labelingproblem can be found in [2].

All traffic in the network is forwarded to the hubs.Because each hub concentrates significant traffic, we pro-pose the use of network coding at the hubs to encodemultiple messages (addressed to the same destination) intoa single encoded message. A hub forwards an encodedmessage to a neighboring hub if this message is linearlyindependent with the encoded messages carried by theneighbor. The use of network coding results in significantsavings in bandwidth, because a single encoded message isforwarded in place of multiple native messages. Further,because the hubs collectively have contact opportunitieswith all other nodes, most of the messages can be deliveredto the destination.

We first present the basic version of our scheme, Hub-CodeV1, which makes use of the traditional approach tonetwork coding [17]. We argue that this scheme requiresthe hubs to exchange significant auxiliary information.Next, we present an alternate approach, HubCodeV2,which requires the intermediate hubs to decode the codedmessages (in addition to the normal encoding operations).As a result, the hubs only need to exchange messageIDs, which reduces the auxiliary data overhead. How-ever, because the hubs decode messages, the computationalcomplexity increases.

3.1. HubCodeV1

In our schemes, message forwarding is a simple three-stepprocess: (i) Source nodes forward messages to a hub; (ii) ahub encodes multiple messages headed to the same des-tination and disseminates the encoded messages amongother hubs; and (iii) a hub delivers the encoded messageto the destination. To simplify the explanation, we clas-sify nodes into three groups: (i) source, (ii) destination, and(iii) hubs. We provide a detailed description of the tasksundertaken by each category of node. Note that a source ordestination can also be a hub, but for simplicity, we assumethe groups are mutually exclusive.

3.1.1. Source

When a source encounters a hub, it creates a copy ofthe message and forwards the copy to the hub. Recallthat the hub nodes are appropriately labeled by analyz-ing past behavior of the network. Each node broadcasts itslabel along with any auxiliary data in the periodic beacons.



As a result, the neighbors of a node can easily identifywhether the peer (i.e., the beacon sender) is a hub or not.If the source carries a single native message, it is for-warded as-is. However, if more than one message are des-tined to the same address, then the source combines theminto a single encoded message using linear network coding(Equation (1)) and forwards the encoded message to thehub. The coding technique is described below.

3.1.2. Hubs

When two hubs encounter each other, they first exchangecertain auxiliary information that is used to decide if thehubs should forward messages to each other (these detailsare explained later). If a hub needs to forward messagesto another hub, it encodes all messages with a commondestination using RLC and forwards the single encodedmessage. This results in significant savings in the band-width. Assume that a hub currently has k messages, X1,X2, � � � , Xk , with a common destination. Then the hubcreates a linear combination [17,27] of these k messagesto form a single encoded message Fl , using Equation (1),

Fl D

kXiD1

aiXi ; ai �Fq (1)

where a1, a2, � � � , ak , represent the coefficients, whichare randomly selected from a finite field [28], Fq , whereq D 216. All the additions and multiplications are per-formed over the finite field Fq , so that the encoded messagehas the same size as the native message. The coefficientsai and the message IDs (idxi ) of all the native messagesare appended to the encoded message prior to transmis-sion. This is because the receiving hub may perform furtherencoding. Because the coefficient vectors are chosen froma large random space, there is a high probability that theyare linearly independent. As a result, two coded messagesthat are created from the same native messages are stilluseful to the destination (decoding is explained later).

Note that hubs do not decode the messages. The encod-ing and forwarding process described previously continueat all intermediate hubs. If a hub holds multiple encodedmessages, then these can be further combined into a singlemessage. For example, assume that a hub has received twoencoded messages, F1 and F2, that have been created asfollows:

F1 D a11X1 C a12X2C a13X3 (2)

F2 D a21X1 C a22X2C a23X4 (3)

Then the hub can combine these two messages to create asingle encoded message, F3, such that F3 D a1F1Ca2F2where a1 and a2 are two randomly selected coefficients.

The discussion has focused on the coding process. Wenow explain the decision making process involved beforea hub encodes messages. Each hub maintains a coeffi-cient matrix for all the encoded messages that it currently

holds. There is one such matrix for each destination. Thecolumns of the matrix correspond to the message IDs, andthere is one row for each encoded message. When twohubs encounter each other, they first exchange the coeffi-cient matrices. These are generally included in the beacons,which are periodically exchanged by nodes. We explainthe decision process for a single destination. These stepsare repeated for each destination. When a hub receives itsneighbor’s matrix, it has to decide if transmitting a linearcombination of all its messages will be useful to the neigh-bor. The hub can determine this by checking if this encodedmessage is linearly independent to the encoded messagescarried by the neighbor. Consider the following example.Let F1 be the encoded message created by this hub, whichis composed of two native messagesX1,X2 and respectivecoefficients set A1, < a1; a2 > (i.e., F1 D a1X1Ca2X3).Also, assume that the hub receives coefficient matrix A2from its neighbor. A1 and A2 are shown below:

A1 D

ˇ̌̌ˇ idx1 idx3a1 a2

ˇ̌̌ˇ ; A2 D

ˇ̌̌ˇ̌̌ idx1 idx2a3 a4a5 a6

ˇ̌̌ˇ̌̌

Because the coefficient matrix is accompanied by the mes-sage IDs (i.e., idxi ) of the corresponding columns of thematrix, the receiving hub can determine which column isassociated with which message. The receiving hub theninserts the coefficient set A1 in the corresponding columnsof A2. In this particular case, A2 does not contain any coef-ficient for the native messageX3 (i.e., there is no column inA2 for the message ID of X3). So, a new column for mes-sage X3 is created. The coefficients for the message X3in A2 will be zero. Similarly, the coefficient of the mes-sageX2 in A1 will also be zero. The modified A2 is shownbelow:

A2 D

ˇ̌̌ˇ̌̌ˇ̌idx1 idx2 idx3a3 a4 0

a5 a6 0

a1 0 a2

ˇ̌̌ˇ̌̌ˇ̌

If the coefficient sets (i.e., rows of the matrix A2) arelinearly independent, then it is assumed that the newlyencoded message (F1) by the hub is useful to its neighbor.

Although this requires the hubs to exchange significantinformation, they can make more informed decisions aboutforwarding encoded messages and hence avoid the trans-missions of redundant messages. Eventually, when a hubmeets the destination, it forwards an encoded messagecomposed of all messages addressed to that destination.

3.1.3. Destination

When the hub encounters a destination, it forwards anencoded message to it. Similar to the hubs, the destinationalso maintains a coefficient matrix. The columns representthe native message IDs, and each row corresponds to anencoded message. Recall that each encoded message is alinear combination of the native messages. Consequently,



to decode nmessages, the destination should receivem lin-early independent combinations of these messages, suchthatm� n. Note that because the coefficients are randomlychosen from a large finite space, there is a high proba-bility that all encoded messages are linearly independent.Hence, n encoded messages are sufficient for decoding(i.e., m D n). The n linear equations can be solved usingmatrix inversion.

For example, if the destination receives the followinglinearly independent encoded messages, F1, F2, and F3(F1 D a11X1Ca12X2Ca13X3; F2 D a21X1Ca22X2Ca23X3; F3 D a31X1 C a32X2 C a33X3), then the set oflinear equations can be written in matrix form fD Ax.

AD

24 a11 a12 a13a21 a22 a23a31 a32 a33

35 ; xD

24 X1X2X3

35 ; fD

24 F1F2F3

35

The native messages can be retrieved by matrix inversion:

xD A�1f (4)

Figure 1 presents an illustrative example of Hub-CodeV1. There are three hubs: A, B , and C . Q, R, S , andD are regular nodes. Let us assume that R, S , and Q aresource nodes that wish to transmit messages X2, X1, andX3, respectively, to a common destination D. The arrowsin the figure indicate that the two nodes can communicatewith each other. For example, at t1, bothR and S are in thecommunication range ofA. The direction of the arrow indi-cates the flow of data messages. Cv is the coefficient vectorthat is appended to the encoded message. It takes the form

hidxi W ai i, where idxi represents the message ID andai denotes the coefficient. The figure is self-explanatoryand shows the sequence of operations that are involved indelivering the messages to the destination, D.

3.2. HubCodeV2

The main drawback of HubCodeV1 is that the hubs needto exchange their coefficient matrices to make the forward-ing decision. The overhead of this exchange is O.n2/ forn messages. This overhead is particularly of concern whenthe contact durations with other hubs are short lived. Thisis because in such instances, the exchange of auxiliaryinformation may dominate the entire contact opportunity.Empirical measurements have shown that in real-worldDTN [1], contact durations can often be quite short. Tosolve this problem, we present an alternate approach thatseeks to reduce this overhead without penalizing messagedelivery.

In V1, a hub uses the coefficient matrices received froma neighbor to determine if forwarding an encoded mes-sage is beneficial to this neighbor. However, if a hub candecode the coded messages to recover the native messages,then it can simply send a list of native message IDs to itsneighbors instead of the coefficient matrix. As a result, theneighboring hub can make the same decision. Sending alist of message IDs reduces the auxiliary data overheadto O.n/ for n messages, as compared to O.n2/ with V1.However, this gain comes at the expense of extra compu-tation. Because the hubs now decode messages, the com-putational complexity increases to O.n2/ (solving n linear

Figure 1. Example illustrating the operation of HubCodeV1 at (a) t1, (b) t2, (c) t3, (d) t4, (e) t5, (f) t6, and (g) t7.



equations has a complexity of O.n2/). On the other hand,in V1, the hubs only encode messages, which incurs a com-plexity of O.n/. Most personal communication devices(such as smart phones, personal digital assistants) and in-vehicle routers have sufficient processing capabilities andbattery power to perform the decoding operations. Hence,this scheme can be readily deployed in most people-centricDTN. However, V2 is not suitable for resource-constraineddevices such as sensor nodes. The two versions address animportant trade-off between computational complexity androuting overhead.

As in V1, we classify nodes in three different groups:(i) source, (ii) hubs, and (iii) destination. We explain theoperations performed by each type of node.

3.2.1. Source

As in V1, the source creates a copy of the native mes-sage (without encoding) and forwards it to a hub. However,unlike V1, in this scheme, the hubs may posses native mes-sages (because they decode messages). As a result, a sourceforwards the native message to a hub only if the latterdoes not have this message. The source can determine thisby examining the auxiliary information (i.e., message IDs)transmitted by the hub in the beacons.

3.2.2. Hubs

When a hub receives an encoded message for a destina-tion, it examines the other encoded messages in its queueheading to the same destination. If sufficient encoded mes-sages are present, then the hub decodes these messages(decoding was explained in V1) and stores the native mes-sages. In the event that sufficient messages have not beenreceived, the encoded messages are stored as-is.

When two hubs encounter each other, they exchange themessage IDs of the native messages that they carry. If a hubcontains an encoded message, which has not been decodedyet, then the coefficients of this message are not includedin the auxiliary information. In other words, only the infor-mation of the native messages is exchanged. When a hubreceives the native message list of its neighboring hub, itcompares this list to the native messages waiting in itsqueue and also to the messages that are used to composethe encoded messages (if any). If the hub finds at least onemessage (either native or a part of an encoded message)that is not in its neighbor’s list, then the hub encodes themissing messages along with any other messages (nativeor encoded) for that destination and forwards the combi-nations to that neighbor. As in hubcode V1, the hubs donot forward any message (either native or encoded) to anon-hub node. Recall that the rationale behind confiningmessage dissemination among hubs is to reduce redundantmessages without penalizing delivery ratio, because hubscollectively cover most of the network.

When the hub meets the destination and if it only hasencoded messages for that destination (i.e., no native mes-sages), then it sends an encoded combination of thesemessages. If the hub has one or more native messages in

its queue, then it simply forwards them to the destinationwithout coding.

3.2.3. Destination

The decoding operation at the destination is exactly sim-ilar as in HubCodeV1. Hence, we do not provide detailshere. The only difference is that unlike V1, the destina-tion may receive native messages in addition to codedmessages. Figure 2 highlights the basic operation of Hub-CodeV2. We have used the same scenario as in the examplefor HubCodeV1.

4. MATHEMATICAL ANALYSIS

In this section, we present a mathematical model toestimate the message delivery delay in our hub-based for-warding schemes. In particular, we derive closed-formexpressions for message delivery delay in hub-basedforwarding schemes when (i) hubs utilize network codingto disseminate a message among themselves (e.g., Hub-Code V1, V2) and (ii) hubs do not perform network codingand merely send copies of the message to other hubs(we refer to this as Hub-only scheme). We include thelatter scheme in our analysis to quantify the improvementsachieved by using network coding.

We use the widely used network model introduced in[22] as the starting point for our analysis. In this model,the characteristics of ad hoc networks are captured throughtwo parameters of the network: (i) the number of nodes inthe network and (ii) the intensity of identical and indepen-dent Poisson processes that model the meeting instancesbetween any pair of nodes. Although, originally, the net-work model was proposed for ad hoc networks, it fits wellin the context of DTN. This is because of the fact that themeeting instances of the nodes (or inter-meeting durations)determine the message forwarding rate in DTN.

To determine the message delivery delay, we need toestimate the average message forwarding rate. In additionto the inter-meeting durations, the message forwarding ratealso depends on various forwarding properties, such asnature of the forwarding protocol and importance of themessages. In the following sections, we start our discus-sion by introducing the generic routing model. Followingthis, we begin our mathematical analysis by introducingthe factors that affect the message forwarding rate. Oncewe model the message forwarding rate, we can formulatethe expected message delivery delay in a straightforwardmanner.

4.1. Stochastic forwarding model

We assume a network that consists of one source, one des-tination, andN�1 relay nodes (i.e., hubs). We also assumethat non-hub nodes will not affect forwarding mecha-nism because message forwarding is performed only bythe hubs. Two nodes may only communicate when theyare within communication range. The duration when two



Figure 2. Example illustrating the operation of HubCodeV2 at (a) t1, (b)–(d) t2, (e)–(g) t3, (h) t4, (i) t5, (j) t6, and (k) t7.

nodes are connected is referred to as the meeting time,and the time that elapses between two consecutive meetingtimes of a given pair of nodes is called the inter-meetingtime. For simplicity, the transmission time is assumed tobe instantaneous. This is a valid assumption if the size ofthe message is very small. Obviously, if we had a realis-tic message traffic model available, then we could havetaken a more pragmatic approach in modeling the transmis-sion time. However, modeling message traffic is non-trivialbecause of its dependence on the network characteristics

and network protocol behavior. We have also ignored theeffect of processing delays.

Figure 3 shows the state diagram of the generic routingmodel. The system is in state i 2 1; 2; : : : ; N when thereare i copies of the message in the network. It is in state Awhen the message has been delivered to the destination. LetRFi (i 2 1; 2; : : : ; N � 1/ be the average message forward-

ing rate to another hub and RAi .i 2 1; 2; : : : ; N / be the

average message forwarding rate to the destination whenthere are i copies of the message in the network. As can be

Figure 3. State diagram of generic routing model.



seen from Figure 3, the mean time for a message to go fromstate 1 to state A (i.e., from source to destination) directly(i.e., without the forwarding via hub) is 1/RA

1 . If the mes-sage takes one hop to reach the destination, then the meandelay becomes

1

RF1

C1

RA2

Similarly, if the message takes two hops to reach thedestination, then the mean delay becomes

"1

RF1

C1

RF2

#C

1

RA3

Generalizing, the mean time for a message to go fromstate 1 to state A in k-th (k 2 1; 2; 3; : : : ; N � 1) hop isgiven by

kXiD1

"1

RFi

#C

1

RAkC1

(5)

In other words, the above expression provides the meanmessage delivery delay when the message takes k relaysto reach the destination. Using this simple reasoning, wedevelop our message delay model for both HubCode andHub-only forwarding schemes in the next sections.

As mentioned earlier, the message forwarding ratedepends not only on the inter-meeting durations betweennodes but also on other factors, such as forwarding princi-ple and number of copies in the network. Next, we discussthe major factors that affect the message forwarding rates(RFi and RA

i ) and quantify the impact of these factors onthe forwarding rate.

4.2. Factors that influence the messageforwarding rate

4.2.1. Contact rate between nodes

Because nodes can only forward messages during con-tact periods, the contact rate between nodes directly influ-ences the message forwarding rate. Inter-meeting durationsdetermine the contact rate. In a study on the associationpatterns of wireless access points [18], the authors foundthat the inter-meeting durations between nodes of a largenetwork can be modeled using generalized Pareto distri-butions. Our study with VANET trace also shows similarfindings. For example, in Figure 8, we have plotted theprobability mass function of inter-contact durations of theSeattle trace during the 3PM–8PM period and a close-fitPareto distribution to show the goodness of fit.

If the inter-meeting durations between a pair of nodesare random variables (X ) with a power-law distribution(e.g., Pareto distribution), then the expected inter-meetingduration E.X/ is given by the following relation. Readersare referred to Appendix A in [29] for further elaboration.

E.X/ D .˛xmin/=.˛ � 1/, where xmin is the minimumvalue of X and ˛ is a positive parameter often termed asthe tail index of the power-law distribution. The value of ˛and xmin can be obtained empirically by using the methodof maximum likelihood. In fact, the value of ˛ (i.e., the tailindex) can be approximated graphically by plotting the datasample on a log–log plot and taking the slope of close-fitstraight line of the data points.

The average contact rate R is the reciprocal of theaverage inter-contact duration E.X/. Therefore, R is

RD˛ � 1

˛xmin(6)

4.2.2. Number of copies of messages

At the beginning, when the number of copies of amessage is low, the message forwarding rate increasesexponentially. This exponential growth can be explainedintuitively by considering the fact that at the beginning,the source copies the message to another node; then thesetwo nodes can copy the message to another two nodes,thus making the total copies equal to four. Now, these fournodes can copy the message to another four nodes, thusmaking the total copies to eight, and so on. However, thisrate declines as the copies saturate the network. We elab-orate this effect of the number of copies (of a message) inforwarding rate in the next section.

4.2.3. Forwarding mechanism

The mechanism of any particular routing protocol maychange the message forwarding rate. For example, in ourrouting protocol (e.g., non-coding Hub-only version), anode forwards a message to only a fraction of the totalnodes (i.e., highly connected nodes) and thereby reducesthe message forwarding rate by a factor f . Intuitively, fis the probability that the next encountered neighbor isa hub.

4.2.4. Relevance of the messages

The message forwarding rate also depends on the rel-evance of the message to other nodes (i.e., a redundantmessage is not relevant to a node). Let p be the probabil-ity that a message is important to the forwarded node. Fornetwork coding based schemes, p � 1, because almost allmessages are important (i.e., linearly independent) becauseof the application of random linear encoding scheme. Inparticular, it has been shown [27] that for network codingbased schemes, p D 1� .1=d/ where d is the coefficientsspace. Intuitively, the linear independence among the mes-sages depends on the size of the random coefficients. Thelarger the coefficients, the greater the probability that themessages are linearly independent. Because the finite fieldsize d is often chosen as 216, p approaches to 1. On theother hand, when network coding is not employed, anal-ogous to the coupon collector’s problem [30], p becomes1 � ..j � 1/=M/, where M is the total number of unique



messages destined to the same address, and j -th (j �M )message has been received successfully.

p D

(1 for HubCode

1� j�1M

for Hub-only (without coding)(7)

In the following two sections, we accommodate the pre-viously mentioned factors in the average message forward-ing rates (RF

i and RAi ) and estimate the message delivery

delay for both Hub-only and HubCode schemes.

4.3. Message delivery delay inHub-only scheme

In this subsection, we develop the message delay modelof Hub-only forwarding scheme. A brief definition of thenotations used here can be found in Table I. Figure 4shows the state diagram of the Hub-only (without cod-ing) forwarding scheme. Recall that in Hub-only forward-ing scheme, the source forwards the message to hubs, andhubs disseminate it among themselves in epidemic manner.Eventually, one of the copies reaches the destination.

The message forwarding rate RFi at each state i is a

function of the number of copies i of the message in thenetwork. When there are i (1 < i � N ) copies of the samemessage in the network, then each of those nodes eithersends a new copy to the N � i hubs, which do not havea copy yet, at a rate of RF

i D i.N � i/Rf or meets thedestination at a rate of RA

i D iRp. The rationale behindthe introduction of the factors p and f are discussed inSection 4.2.

Therefore, the mean time for the j -th (j 2 1; 2; 3; : : : ;M ) message to go from state 1 to state A (i.e., fromsource to destination) in k-th (k 2 1; 2; 3; : : : ; N � 1) step(Figure 4) is

kXiD1

�1

iR.N � i/f

�C

1

.kC 1/Rp

Because in real-world people-centric DTN, most contactdurations are very short [5], it can be assumed that onlyone message can be transferred at each contact opportu-nity. Therefore, the mean time for all M messages to gofrom state 1 to state A in k-th step is given by

M

kXiD1

˛xmin

i.˛ � 1/.N � i/fC

MXjD1

˛xmin

.kC 1/.˛ � 1/�1� j�1

M

�(after replacing the values of R and p (from Equations (6)and (7)).

Therefore, the expected message delivery delayEŒTHO �for all M messages to reach state A from state 1 is

EŒTHO�D1

N � 1

N�1XkD1

24M kX

iD1

˛xmin

i.˛ � 1/.N � i/f

35

C1

N � 1

N�1XkD1

MXjD1

˛xmin

.kC 1/.˛ � 1/�1� j�1

M

�(8)

Table I. Notations.

Term Definition

EŒTHO� Expected message delivery delay of Hub-only (without coding) schemeEŒTHC� Expected message delivery delay of HubCode schemeR Average rate of contact opportunitiesRF

i Average message forwarding rate to another hub when i copies of message are in networkRA

i Average message forwarding rate to destination from a hub or sourcewhen i copies of the message are in network

˛ Power-law tail indexxmin Minimum inter-contact durationp Probability of messages relevance to the peerf Probability that a hub meets another hubN Number of hubsM Number of messages destined to common address

Figure 4. State diagram of Hub-only forwarding scheme.



A closed-formed expression (Equation (9)) of the expec-ted message delivery delay is obtained from Equation (8)by employing calculus approximation methods. The detailshave not been included because of space limitations. Inter-ested readers are referred to Appendix C in [29] for furtherelaboration.

EŒTHO� DM˛xmin

Nf .N � 1/.˛ � 1/

�.N � 2/ln.N � 1/

C fN ln .M/ ln

�N

2

��(9)

4.4. Message delivery delay in HubCode

In this sub-section, we sketch the message delay modelof our HubCode forwarding schemes. Recall that theHubCodeV1 and HubCodeV2 only differ in how theybroadcast auxiliary data in beacons. For example,HubCodeV1 exchanges coefficient matrix, whereas Hub-CodeV2 exchanges native message IDs in beacons. In Hub-CodeV2, to meet the purpose of exchanging the auxiliarydata (i.e., message IDs), the hubs try to decode the mes-sages. However, the basic principle of forwarding mes-sages to hubs, which in turn disseminate messages amongstother hubs using network coding, is the same for bothschemes. Because our model does not incorporate the bea-coning mechanism, the same forwarding model is appli-cable to both HubCodeV1 and HubCodeV2. We use thegeneric term HubCode to refer to both HubCodeV1 andHubCodeV2.

Figure 5 shows the state diagram of the HubCodeforwarding scheme. Note that in HubCode, the sourceforwards the message to hubs, and hubs encode and dis-seminate the encoded message among themselves. Thedestination receives multiple encoded messages from dif-ferent hubs. Unlike the Hub-only (non-coding) scheme,practically all the encoded messages are important to thedestination because of the application of linear encodingmethod. The destination decodes the original messages bysolving the set of linear equations (i.e., encoded messagestogether with coefficient vector).

When there are i (1 < i � N ) copies of the same mes-sage in the network, then unlike Hub-only (non-coding)forwarding scheme, each of those nodes either sends anewly encoded message to the N hubs (because all mes-sages are important to all hubs) at a rate ofRFi D iNRf or

meets the destination at a rate ofRAi D iRp or iR (becausep � 1 in case of HubCode (Equation (7)).

Now, the mean time for a message to go from state 1 tostate A in k-th (k 2 1; 2; 3; : : : ; N � 1) step (Figure 5) is

kXiD1

�1

iRNf

�C

1

.kC 1/R

Therefore, the mean time for allM messages to go fromstate 1 to state A in k-th step is (assuming that only onemessage can be transferred at each contact opportunity)

M

kXiD1

�˛xmin

i.˛ � 1/Nf

�C

M˛xmin

.kC 1/.˛ � 1/

after replacing the value of R (from Equation (6)).Therefore, the expected message delivery delay EŒTHC�

for all M messages to reach state A from state 1 is

EŒTHC�D1

N � 1

N�1XkD1

24M kX

iD1

˛xmin

i.˛ � 1/Nf

35

C1

N � 1

N�1XkD1

M˛xmin

.kC 1/.˛ � 1/(10)

A closed-formed expression (Equation (11)) of theexpected message delivery delay is obtained fromEquation (10) by using calculus approximation methods.The details have been omitted because of space limitations.Interested readers are referred to Appendix D in [29] fordetails.

EŒTHC� DM˛xmin

Nf .N � 1/.˛ � 1/

�.N � 1/ln.N � 1/

C fN ln

�N

2

��N C 2

�(11)

4.5. Model validation

To validate our theoretical model, we compare it withthe simulation results. The details of the mobility traceand other simulation parameters have been discussedthoroughly in Section 5.

Figure 5. State diagram of HubCode forwarding scheme.



In the first set of our simulations, 10 randomly selectedsources (from a pool of 1065 nodes) sent 10 messages eachto a single common destination (randomly selected). Thatis, a total of 100 messages are heading towards a singlecommon destination. The message payload is 1000 bytes.The source nodes generate messages between the period3000 to 4000 s after the start of the simulation. The aver-age delivery delay is measured for only the messagesthat reach the destination. The messages, which are notreceived during the 5 h lifetime of the simulation period,are ignored. We have found that approximately 63% ofthe messages reach the destination in average. The num-ber of hubs is varied from 5 to 150 in the following steps:h5; 25; 50; 75; 100; 125; 150i. The simulation is repeated20 times for each instance. We measure the message deliv-ery delay and plot the average value in Figure 6. We alsoplot the 95% confidence intervals. We compare the simula-tions results to the message delay derived in Equations (11)and (9). The parameters h˛; xmin; f i in Equations (11)and (9) are derived empirically from the mobility tracesas h1:01; 30; 0:8i, respectively. To observe the impact ofincreasing the traffic, we increase the number of messagessent by each random source from 10 to 20 and repeat theabove simulations. The corresponding results are plottedin Figure 7. The theoretical curve of Hub-only scheme(without coding) is also plotted to compare its deliverydelay with that of HubCode. Recall that in the Hub-onlyscheme, unlike HubCode, the hubs do not encode multiplemessages into a single message.

One can observe from Figures 6 and 7 that there is sig-nificant disparity between the simulation and analyticalresults when the number of hub nodes are small (< 50).However, as the number of hub nodes increases (> 50), thetwo results converge. Further, both results exhibit the samegeneral trend, in that the delivery delay decreases as thenumber of hubs increase.

In the following, we discuss the possible causes of thedisparity between the theoretical model and the simulationresults when the number of hubs are small (< 50).

0

5000

10000

15000

20000

25000

30000

35000

20 40 60 80 100 120 140

Del

ay (

s)

Number of Hub Nodes

Analytical Hub-only (w/o coding)

Analytical HubCode

Simulation HubCode

Simulation (HubCode)Analytical (HubCode)

Analytical (w/o coding)

Figure 6. Model validation: 10 message sources.

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

20 40 60 80 100 120 140

Del

ay (

s)

Number of Hub Nodes

Analytical Hub-only (w/o coding)

Analytical HubCode

Simulation HubCode

Simulation (HubCode)Analytical (HubCode)

Analytical (w/o coding)

Figure 7. Model validation: 20 messages sources.

� Theoretical limitation. First of all, the approximateclosed-form mathematical models impose some theo-retical limits on possible values of hubs. For example,a cursory look at the closed-form Equations (11)and (9) reveals that the equations are undefinedwhen N<3 (i.e., hubs<3). This explains the pecu-liar shape of the theoretical model at low valuesof N .

� Biased expectation of inter-contact rates. The accu-racy of the expected inter-contact rate directly affectsthe message delivery delay in Equations (11) and (9).In an empirical study, we have found that the inter-contact durations follows a power-law distributionand model it with a simple Pareto distribution. How-ever, because of the nature of power-law distribution(i.e., most inter-contact durations are short lived, butthere exist few very long inter-contact durations), theexpected inter-contact duration exhibits bias towardfewer very long inter-contact durations. For exam-ple, Figure 8 plots a theoretical Pareto curve that fits

0

0.02

0.04

0.06

0.08

0.1

20 40 60 80 100 120 140

P(X

=x)

Inter-contact Durations (min)

Avg. of Pareto dist. (~11)

P(X=x) ~ 0.008

Pareto DistributionPMF of Inter-contact Duration (3PM-8PM)

Figure 8. Biased expected value of power-law distribution.



the empirical inter-contact distribution of the mobil-ity trace (from 3PM–8PM slot) that we have used inour simulation. The expected value of the theoret-ical Pareto distribution is found to be � 11 min.However, from Figure 8, it can be seen that the prob-ability of occurrence of this expected value is only� 0:008. Hence, the expected value of inter-contactdurations does not accurately represent the rate ofcontact opportunities. In other words, it contributestoward a more pessimistic message delay.

� Multiple transfer per contact opportunity. Recall thatin our model, we assumed that only a single mes-sage can be transferred at a contact opportunity.Although most of the contact durations are very small(e.g., <10 s), few contact durations are indeed largeenough to transfer multiple messages (this is the typ-ical behavior of power-law based distributions). As adirect consequence of the fact, the theoretical modeltends to predict larger message delivery delay than theresults obtained from simulation settings, especiallywhen the number of messages destined to commonaddress is increased (Figure 7).

� Problem of averaging with fewer terms. Our modelexhibits instability when the number of hubs (actingas message relays) is very small (between 1–6). If wedelve into the message delay model in Equations (5),(10), and (8), we find that averaging with fewer terms(i.e., when number of hubs is very small) is responsi-ble for the instability. We explain this by consideringthe generalized delay model in Equation (5), which isrestated below for convenience:

Mean delay for arriving at state A in k steps

D

� 1

RA1

C1

RA2

C : : :C1

RAk�1

!„ ƒ‚ …

1st part

C

1

RFk

!„ ƒ‚ …2nd part

�

In the above equation, the first part represents thecumulative forwarding delay among hubs, and thesecond part represents the delay when a relay deliv-ers the message to the destination. The values ofthe trailing terms in the first part rapidly decreaseas the value of k (i.e., hubs) increases because ofthe rapid growth rate of RA

i . As a consequence, theaverage value calculated (in Equations (10) and (8))with fewer terms becomes significantly larger thanthat with more terms because first few terms aremuch larger than the rest of the terms. For exam-

ple, the average ofh1=RA1 C 1=R

A2

iis greater than

that ofh1=RA1 C 1=R

A2 C 1=R

A3

i. This explains the

initial large discrepancy between the theoretical andsimulation results at fewer number of hubs.

As mentioned earlier, the theoretical curve of Hub-only scheme (without coding) is included to compare itsdelivery delay with that of HubCode. As expected, the

theoretical message delivery delay (Figures 6 and 7) inHub-only scheme is greater than that in HubCode. Thedifference becomes even more prominent when the num-ber of messages to a common destination are increased(Figure 7). Because more messages are headed towardscommon destination (Figure 7), the coding opportunity ofHubCode increases, hence showing improved performancethan Hub-only scheme. In both Hub-only and HubCodeschemes, the theoretical models provide us with upperbounds on the delivery delay.

5. SIMULATION-BASEDEVALUATIONS

In this section, we present simulation-based evaluationsthat compare the performance of the proposed HubCodeschemes with other DTN forwarding schemes. We usemobility traces of a large-scale vehicular DTN network. Inall our simulations, we took a pragmatic approach, whereinthe data exchanged by two adjacent nodes are proportionalto the contact duration. Further, we also accounted for theauxiliary information exchanged by the nodes. In the firstset of experiments, we compared the performance of ourforwarding schemes with others schemes in terms of mes-sage delivery ratio and delivery cost. We also evaluated therobustness of our schemes against random node failure. Inthe second set of simulations, we investigated the impactof various parameters on the forwarding performance.In particular, we studied the impact of varying the percent of nodes classified as hubs, the message size, and thetraffic load.

5.1. Mobility trace details

In recent years, several researchers have conducted empiri-cal measurements to study the behavior of people-centricDTNs. In these experiments, communicating devices(bluetooth, zigbee, etc) are either handed to a volun-teer group [1] or are mounted on moving vehicles [2].The devices record all opportunistic contacts with otherdevices in the participant set and also with external devices.The external contacts are often excluded in the analysis,because complete information about their encounters is notavailable. Because of practical limitations of conductingempirical experiments, the node population in all the tracesis quite small (see Table II). Further, it has been observedthat few hubs are connected to all other nodes (i.e., havefull degree distribution). This is an artifact of the smallpopulation in the traces and is not representative of real-world networks. As a result, we have found that schemesthat exploit the power-law properties, such as BubbleRap,achieve close to optimal performance (results are excludedfor brevity). Hence, employing network coding at the hubsoffers little advantage.

Therefore, in our evaluations, we use mobility tracesfrom a significantly larger network that capture the



Table II. Properties of mobility traces.

Data set Intel [31] Cambridge Computer Laboratory [31] INFOCOM’05 [1] Seattle Buses [32]

Period 05-01-06 to 05-01-11 05-01-25 to 05-01-31 05-03-07 to 05-03-10 01-10-30 to 01-11-26Device iMote iMote iMote NoneNetwork Bluetooth Bluetooth Bluetooth WiFi (simulated)Nodes 9 12 41 1163Coverage Top 2 covered 9 Top 2 covered 12 Top 4 covered 41 Top 100 covered 865

movement of public transport buses from the King CountyMetro bus system in Seattle [32]. This transport networkconsisted of 1163 buses plying over 236 distinct routescovering an area of 5100 km2. The traces were collectedover a 3 week period in October–November 2001. Thetraces are based on location update messages sent by eachbus. Each bus logs its current location using an automatedvehicle location system, its bus and route ID along with atimestamp. The typical update frequency is 30 s. The tracescan be readily used to simulate a bus-based DTN, simi-lar to DieselNet [5]. As in [5], we assumed that each busis equipped with a 802.11b radio. Buses exchange mes-sages when they are within the communication range ofeach other, which was assumed to be 250 m.

The traces were post-processed to generate fine-grainedlocation information. The details have been omitted forreasons of brevity. The trace also shows power-law behav-ior. However, unlike the other traces, no single hub con-nects to most of the other nodes (the maximum degree ofa hub is 0.15). This is representative of a large real-worldpeople-centric DTN.

5.2. Simulation settings

We used a custom discrete event simulator. We assumedthat each node broadcasts a beacon every 5 s, for neighbordiscovery. The beacons also contained additional informa-tion as required by the routing scheme (e.g., HubCode).Because we wished to study the performance of the rout-ing schemes in isolation, the 802.11 MAC was not imple-mented. In each simulation, we injected 1000 messages to100 destinations (both source and destination are randomlyselected from the pool of 1065 nodes). We assumed thatthe message inter-arrival duration at the source is expo-nentially distributed, with the average inter-arrival durationset to 30 s. The message payload was 1000 bytes. Thesource nodes generated messages between the period 3000to 4000 s after the start of the simulation. This is becauseseveral nodes are inactive in the initial period. Each simu-lation lasted for 5 h (18 000 s). We chose the 3PM–8PM

period from two successive weekdays, 31 October 2001(Wednesday) and 1 November 2001 (Thursday). We simu-lated each trace 20 times for statistical significance. Theresults presented were averaged over the 20 runs. The95th percentile confidence intervals were all within 10% ofthe average.

To evaluate the performance, we used the following met-rics: (i) delivery ratio, which is the ratio of the messages

delivered to the messages created, (ii) delivery cost, whichis measured as the total number of messages transmitted,normalized by the total number of unique messages cre-ated, and (iii) delivery delay, which is the average end-to-end delay of the delivered messages. Note that the deliverycost does not include the auxiliary messages exchanged bythe nodes in the beacons.

We compared the performance of the HubCode schemes(we use the term HubCode to refer both HubCodeV1 andHubCodeV2) with several other DTN forwarding schemes.Epidemic (i.e., flooding) [33] was included because itachieves the highest delivery ratio when the network isnot congested. We chose spray and wait [34] as a repre-sentative restrictive multiple-copy forwarding scheme. Inthis scheme, the source initially makes n copies of a mes-sage and forwards half of those to a neighbor it meets. Thisnode in turn repeats this strategy; that is, it forwards halfof the messages to the next encountered node and retainsthe other half. This process repeats until a node is left withone copy, which is only forwarded to the destination. In oursimulation, we chose initial number of copies n to 8, whichis a fair compromise between the delivery ratio and the cost[34]. RLC [17] and BubbleRap [7] were included becausethey are closely related to our work (see Section 2). Weused the same methods as described in [7] to rank thenodes.

To implement HubCode, it is necessary to identify thehub nodes in the network. For this, we analyzed the tracesfrom an entire weekday and ranked the nodes accordingto the number of unique nodes they have encountered.We found that the ranking is similar for all the week-days, because the buses follow repeatable patterns. Mostpeople-centric networks are known to exhibit such repeat-able behavior. The top 10% of nodes (i.e., 116 buses)were classified as the hubs. We have utilized the cen-tralized approach to rank hubs for simplicity. However,distributed approaches similar to distributed communitydetection methods [7,35] might be applicable to decentral-ize the ranking problem. Because of lack of space, we omitthe discussion of the distributed ranking approaches.

Contact durations are finite and short lived in the real-world scenario. For example, analyzing the bus traces, wefound that several contact durations are smaller than 30 s.Similar behavior is observed in other real-world networks[1]. Hence, we assumed that the amount of data exchangedis proportional to the length of the contact duration. Forsimplicity, we assumed the following linear relationship.The amount of data, D, exchanged during a contact dura-tion of Tc seconds is given by D D .Tc � Ta/ � 4 Mbps.



Empirical experiments have shown that in 802.11b, thetypical goodput (accounting for overheads) at the highestdata rate of 11 Mbps is around 4 Mbps [36]. Ta refers tothe association time, which includes typical time to asso-ciate with an access point. For simplicity, we assumeda fixed value of 10 s for Ta. We also accounted for thetime required to exchange the beacon messages (note thatthe size of the beacons vary depending on the forwardingscheme employed).

5.3. Delivery ratio, cost, and delay

Figures 9, 10 and 11 plot the delivery ratio, deliverycost and delivery delay, respectively, for all the forwardingschemes. The delivery ratio for HubCodeV2 is approxi-mately 72%, which outperforms all other schemes by about15 � 20%. On the contrary, the delivery ratio for Hub-CodeV1 is about 60%. Recall that, in HubCodeV2, thehubs do not exchange the complete coefficient matrices asin HubCodeV1. Rather, they only exchange native messageIDs. Hence, in HubCodeV1, particularly when the con-tact opportunities are short, significant time is utilized in

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2000 4000 6000 8000 10000 12000 14000 16000 18000

Del

iver

y R

atio

Time

EpidemicBubbleRap

HubCode_V1HubCode_V2Spray & Wait

RLC

Figure 9. Delivery ratio.

0

100

200

300

400

500

600

700

2000 4000 6000 8000 10000 12000 14000 16000 18000

Cos

t (#T

x m

sgs.

/#or

ig. m

sgs.

)

Time

EpidemicBubble

HubCode_V1HubCode_V2Spray & Wait

RLC

Figure 10. Delivery cost.

Figure 11. End-to-end delay.

exchanging the coefficient matrices, thus leading to severalwasted opportunities for transferring data. Figure 10 showsthat V1 still outperforms V2 slightly (by 20%) in terms ofthe delivery cost, as a consequence of the extra informationexchanged in the beacons. Also, observe that the Hub-Code schemes are far superior in comparison with all otherschemes (e.g., Epidemic incurs 300% excess costs as com-pared with V1). This is because our schemes restrict themessage forwarding within the hubs and encode multiplemessages together.

The delivery ratio of epidemic is about 60%, which isless than that of HubCodeV1. This is because a node maynot be able to transfer all the messages it carries to othernodes during an encounter. When contact durations arebounded, epidemic essentially resembles a restricted flood-ing scheme such as spray and Wait, as evident from theirsimilar delivery ratio. The delivery cost of epidemic is stillsignificantly high as compared with other schemes. Sprayand wait reduces the cost by about 15% as a result of thecap on the number of copies exchanged between nodes.Despite employing network coding, the performance ofRLC is poor in comparison with HubCode. This is because,in RLC, encoded messages are flooded in the entire net-work. Hence, the overhead of unnecessary message repli-cations and dominating auxiliary data exchange exhaustscarce bandwidth.

Although a large message delay is common in DTNs, itis always desirable to receive a message as early as pos-sible. To get an idea of message delivery delay, we plot(Figure 11) the average end-to-end delay for all deliveredmessages for the forwarding protocols in consideration.The x-axis represents the delivery ratios of the forward-ing protocols as time elapses, whereas the y-axis representsthe average delays of the delivered messages for the corre-sponding delivery ratios. Consider the instance when thedelivery ratio is 60% (as indicated by the vertical line inFigure 11). In this case, the average delay for HubCodeV2is approximately 3000 s lower than its nearest contender(i.e., RLC). This is because of the fact that the unneces-sary copies of the messages created by the other protocols



exacerbate over time and exhaust scarce bandwidth, whichin turn has a negative impact on the message delivery ratioand delay. Recall that in HubCodeV1, nodes exchange theircomplete coefficient matrices, which grow as time elapses.As a consequence, the behavior of HubCodeV1 degen-erates into that of a closely related protocol, RLC. Thisexplains the similar delivery delays (� 11000) experiencedby HubCodeV1 and RLC.

5.4. Effect of node failure

In people-centric networks, node failures are a reality. Anode failure may occur as a result of software/hardwarefailures or energy depletion. Further, a node may also ceaseto participate in message forwarding activity. In this sec-tion, we investigate the impact of node failures on the for-warding schemes under consideration. A randomly chosenfraction of nodes is made inactive over the entire dura-tion of the simulation. We observe the impact of varyingthe percentage of inactive nodes on the delivery ratio.This allows us to investigate the robustness of our schemeto node failures. As before, the simulation is repeated20 times, and the average delivery ratio is plotted. Notethat we consider two different cases. In the first instance(Figure 13), we assumed that the inactive nodes are chosenfrom the entire set of nodes. In the second case, we onlypicked hubs as the inactive nodes (Figure 12).

As can be seen from the graph (Figure 12), even a 20%failed hub nodes has only a minor effect on the deliv-ery ratio (a 7% decline). Recall that the hubs are usuallywell connected (i.e., have high out degrees) and that Hub-Code disseminates messages among hubs that collectivelyform a data conduit. The high inter-connectivity amongthe hubs provides alternate paths to disseminate messagesamong themselves. As a result, our schemes are robust tothe failure of a few hubs. Figure 12 also shows that otherschemes such as epidemic and RLC are also not signifi-cantly affected by node failures. This is because in both

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

Del

iver

y R

atio

% of Failed Nodes

HubCode V2HubCode V1

EpidemicSpray & Wait

BubbleRLC

Figure 12. The effect of random node failure on messagedelivery.

of these schemes, messages are copied to all encounterednodes. Therefore, failure of some random nodes does notimpact the message delivery ratio.

If the number of failed nodes is drawn from the entirenode population (i.e., not merely from the group ofhubs), the probability that the failed nodes contains hubsdecreases. Because HubCode only uses hubs (a fractionof total nodes) as delivery messengers, failure of non-hubnodes does not affect its delivery performance. Figure 13shows the effect of random hub failure on the deliveryratio of HubCodeV2. The same trend is also observed inHubCodeV1 (not shown in Figure).

5.5. Effect of the number of hubs

Recall that HubCode uses the hubs as the message relays.In the previous experiments, we assumed that the top 10%of the nodes, ranked according to the degree distribution,are classified as the hubs. A natural question arises: what isthe impact of increasing the number of hubs on the perfor-mance? In this set of experiments, we sought to answerthis question. We considered the same parameters as inSection 5.3. Figure 14 illustrates the impact of increas-ing the percentage of nodes that constitute the hubs onthe delivery ratio. The delivery ratio for both schemes ini-tially increased as we increased the number of hubs. Thiswas expected because the number of nodes that the hubscan collectively contact increases. As a result, the num-ber of messages that can reach the destinations increase.However, after a certain knee point (around 20 � �30%),the delivery ratio began to decrease. This is because anincrease in the number of hubs reduces the number of mes-sages carried by each hub. As a result, the opportunities forcoding of multiple messages decrease. Also, the increasedoverhead caused by the increasing number of hubs is alsoa key factor that reduces the delivery rate. Because theoverhead of HubCodeV2 is less than that of HubCodeV1,the knee point in HubCodeV2 is slightly higher than that of

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

Del

iver

y R

atio

% of Failed Nodes

HubsAll Nodes

Figure 13. The effect of random hub failure on message delivery(HubCode V2).



0.5

0.6

0.7

0.8

0.9

1

10 20 30 40 50 60 70 80 90 100

Del

iver

y R

atio

% of Top Ranked Nodes as HUBs

HubCode_V1HubCode_V2

Figure 14. Effect of varying hub size.

HubCodeV1 (30% as compared with 20% in HubCodeV1).Note that when 100% of the nodes were regarded as thehubs, HubCode degenerates into RLC. The result concern-ing delivery costs is not shown in this paper because thefindings are obvious: the costs continue to increase withincreasing number of hubs. These results suggest that onlya small percentage of the nodes (20 � �30%), which arehighly connected, should be designated as hubs.

5.6. Effect of message size

In this section, we investigate the impact of the size of themessage blocks on the delivery ratio. When the messageblock size is small, the auxiliary headers make up for asignificant percentage of the packet size, which reducesthe forwarding efficiency. On the other hand, if the mes-sage block is too large, a node may fail to forward it ifthe contact duration between the nodes is small. In thisexperiment, we seek to find the optimum block size whentransferring a long file.

We assumed that each file is 100 KB long and thatthere are 1000 files to send. The number of destinations is100, which were selected randomly from our pool of 1065nodes. We varied the message block size from 1 to 3000(in 500 bytes step) and observed the delivery ratio. Otherparameters remained the same as in the previous simu-lations. The experiment was repeated 20 times, and theaverage value of the delivery ratio was plotted (Figure 15).

In HubCode versions, the coding opportunities increasewhen block size is small because the probability of thenumber of blocks headed toward common destinationsincreases. However, this effect is diminished by the neg-ative impact of the increased amount of overhead causedby larger coefficient vectors and headers. As a result, theHubCode versions showed very poor performance whenthe block size is < 50 bytes.

The delivery ratio begins to increase when the messageblock size is beyond 50 bytes. However, after reachinga cutoff point (� 500 in this case), the delivery ratio

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500 3000

Del

iver

y R

atio

Message Size (bytes)


Figure 15. Effect of varying message size.

declines in response to further increase in the messageblock size. This is because as the message size becomeslarger, the contact opportunities required for successfulmessage transfer becomes scarce (i.e., small contact oppor-tunities may fail to transfer large blocks) and negativelyaffects the delivery ratio.

5.7. Effect of traffic load

In this experiment, we gradually increased traffic load toobserve its effect on the delivery performance of the rout-ing protocols. As before, 100 destinations were randomlyselected from the pool of 1065 nodes. We varied the num-ber of injected messages from 500 to 3000 (incrementedin blocks of 500). Messages were injected in the net-work 3000 s after the simulation began at random intervals(using exponential random variable with � D 30). Thewhole experiment was repeated 20 times, and the averagevalues were plotted (Figure 16).

Because the number of destination remains the same(i.e., 100) in all cases, the increased traffic load implies

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000 2500 3000

Del

iver

y R

atio

Traffic Load (number of messages)


Figure 16. Effect of varying traffic load.



that more messages are directed to common destinations.The delivery ratio initially increases steadily as trafficload is increased up to a knee point (1500 messages inthis scenario) and then declines slowly when the trafficload is increased further. This is because as the trafficload is increased, the probability that more messages havecommon destination increases, and so do the coding oppor-tunities among the hubs. An increase in coding opportunityimplies more efficient utilization of Bandwidth and henceincreases the delivery ratio. However, if the traffic load isincreased further beyond the knee point, the auxiliary dataoverhead (because of the increased size of coefficients)dominate and exhaust scarce bandwidth resources (recallthat most contact durations are very small). As a result, thedelivery ratio begins to decrease when the traffic load isincreased beyond the threshold.

6. CONCLUSION

In this paper, we proposed a novel forwarding strat-egy called HubCode for people-centric DTN that exhibitspower-law behavior. HubCode uses the highly connectednodes as message relays. Further, messages are forwardedamongst the hubs using linear network coding. We pre-sented two alternate implementations of HubCode toaddress the important trade-off between routing overheadand computational complexity. Our simulation-based eval-uations of a large-scale vehicular DTN demonstrated theefficacy of our schemes. In particular, under pragmaticassumptions, our schemes were shown to achieve 20%higher delivery ratio and less than half of the delivery costsof comparable strategies.

We have derived closed-form expressions for messagedelivery delay of our proposed scheme and validated ourmodel by comparing with simulation results. We show thatour mathematical model serves as a lower bound for themessage delivery delay incurred under normal operationalconditions.

As a future work, we would like to examine the effectof the bundle-protocol layer and the convergence layeron our forwarding protocols. Licklider protocol [37] isa good convergence layer to start with, because recentresearch [38] shows its effectiveness compared with otherconvergence-layer protocols (e.g., user datagram protocol-based convergence layer protocol, transmission controlprotocol-based convergence layer protocol).

REFERENCES

1. Hui P, Chaintreau A, Scott J, Gass R, Diot C,Crowcroft J. Pocket switched networks and humanmobility in conference environments. In Proceedingsof the 2005 ACM SIGCOMM Workshop on Delay-tolerant Networking. ACM: New York, NY, 2005.

2. Burgess J, Gallagher B, Jensen D, Levine B N. Max-prop: routing for vehicle-based disruption tolerantnetworks. IEEE INFOCOM, Barcelona, Spain, April2006.

3. Zhang Z. Routing in intermittently connected mobilead hoc networks and delay tolerant networks: overviewand challenges. IEEE Communications Survey andTutorials 2006; 8(1): 24–37.

4. Chen LJ, Chen YC, Sun T, Sreedevi P, Chen KT, YuCH, Chu HH. Finding self-similarities in opportunisticpeople networks. 26th IEEE International Conferenceon Computer Communications, Anchorage, AK, 6–12May 2007; 2286–2290.

5. Zhang X, Kursoe J, Levine B, Towsley D, Zhang H.Study of a bus-based disruption tolerant network:mobility modeling and impact on routing. In Proceed-ings of the 13th Annual ACM International Conferenceon Mobile COmputing and Networking. ACM: NewYork, NY, 2007.

6. Yoneki E, Hui P, Crowcroft J. Distinct types of hubsin human dynamic networks. In Proceedings of the1st Workshop on social Network Systems. ACM: NewYork, NY, 2008.

7. Hui P, Crowcroft J, Yoneki E. Bubble rap: social-basedforwarding in delay tolerant networks. In Proceedingsof the 9th ACM International Symposium on MobileAd Hoc Networking and Computing. ACM: New York,NY, 2008.

8. Lebrun J, Chuah CN. Delegation forwarding. Proceed-ings of MobiHoc, May 2008; 26–30.

9. Ahlswede R, Cai N, Li SYR, Yeung RW. Networkinformation flow. IEEE Transaction on InformationTheory 2000; 46(4): 1204–1216.

10. Ahmed S, Kanhere SS. Hubcode: message forwardingusing hub-based network coding in delay tolerant net-works. In Proceedings of the 12th ACM InternationalConference on Modeling, Analysis and Simulation ofWireless and Mobile Systems, MSWiM 2009. ACM:New York, NY, 2009; 288–296.

11. Kamal AE, Ramamoorthy A, Long SLL. Overlay pro-tection against link failures using network coding.IEEE/ACM Transactions on Networking 2011; 18(6).

12. Katti S, Rahul H, Hu W, Katabi D, Medard M,Crowcroft J. Xors in the air: practical wireless networkcoding. In Proceedings of the 2006 Conference onApplications, Technologies, Architectures, and Proto-cols for Computer Communications. ACM: New York,NY, 2006.

13. Dong Q, Wu J, Hu W, Crowcroft J. Practical networkcoding in wireless networks. In Proceedings of the13th Annual ACM International Conference on MobileComputing and Networking. ACM: New York, NY,2007.



14. Li Z, Li B. Network coding: the case of multipleunicast sessions. Proceedings of the 42nd AllertonAnnual Conference on Communication, Control, andComputing, September 2004.

15. Xi Y, Yeh EM. Distributed algorithms for minimumcost multicast with network coding in wireless net-works. 4th International Symposium on Modeling andOptimization in Mobile, Ad Hoc and Wireless Net-works, 3–6 April 2006; 1–9.

16. Widmer J, Boudec J-YL. Network coding for efficientcommunication in extreme networks. In Proceedingsof the 2005 ACM SIGCOMM Workshop on Delay-tolerant Networking. ACM: New York, NY, 2005.

17. Zhang X, Neglia G, Kursoe J, Towsley D. On the ben-efits of random linear coding for unicast applicationsin disruption tolerant networks. 2nd Workshop on Net-work Coding, theory and applications (NetCod), April2006.

18. Hsu WJ, Helmy A. On nodal encounter patterns inwireless LAN traces. The 2nd IEEE InternationalWorkshop on Wireless Network Measurement (WiN-Mee), Boston, MA, April 2006.

19. Freeman L. A set of measures of centrality based onbetweenness. Sociometry 1977; 40(1): 35–41.

20. Hass ZJ, Small T. A new networking model forbiological applications of ad hoc sensor networks.IEEE/ACM Transactions on Networking 2006; 14(1):27–40.

21. Small T, Hass ZJ. Quality of service and capacity inconstrained intermittent-connectivity networks. IEEETransactions on Mobile Compuitng 2007; 6(7).

22. Groenevelt R, Nain P, Koole G. Message delay inmobile ad hoc networks. Performance Evaluation2005; 62(1–4): 210–228.

23. Zhang X, Neglia G, Kursoe J, Towsley D. Performancemodeling of epidemic routing. Computer Networks2007; 51(10): 2867–2891.

24. Lin Y, Li B, Liang B. Stochastic analysis of networkcoding in epidemic routing. ACM MobiOpp, 2007.

25. Small T, Hass Z. Resource and performance tradeoffsin delay tolerant wireless networks. In Proceedingsof the 2005 ACM SIGCOMM Workshop on Delay-tolerant Networking). ACM: New York, NY, 2005.

26. Sharam G, Mazumdar R. On achievable delay/capacitytrade-offs in mobile ad hoc networks. IEEE WiOpt,2004.

27. Deb S, Medard M, Choute C. Algebraic gossip: anetwork coding approach to optimal multiple rumor

mongering. IEEE Transactions on Information Theory2006; 52(6): 2486–2607.

28. Rudolf L, Harald N. Finite Fields (2nd edn).Cambridge University Press: Cambridge, 1997.

29. Ahmed S, Kanhere SS. HUBCODE: hub-basedforwarding using network coding in delay tolerant net-works. Technical Report UNSW-CSE-TR-1009, Uni-versity of New South Wales. ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/1009.pdf [Accessed on 22April 2011].

30. Flajolet P, Gardy D, Thimonier L. Birthday para-dox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics1992; 39: 207–229.

31. Scott J, Gass R, Crowcroft J, Hui P, Diot C,Chaintreau A. CRAWDAD data set cambridge/haggle(v.2006-09-15) [WWWdocument]. URL http://craw-dad.cs.dartmouth.edu/cambridge/haggle [accessed Sep-tember 2006].

32. Jetcheva JG, Hu YC, Chaudhuri SP, Saha AK,Johnson DB. Design and evaluation of a metropoli-tan area multitier wireless ad hoc network architecture.Proceedings of the fifth IEEE workshop on MobileComputing Systems and Applications, 9–10 October2003; 32–43.

33. Vahdat A, Becker D. Epidemic routing for partiallyconnected ad hoc networks. Technical Report CS-200006, Duke University, 2000.

34. Spyropoulos T, Psounis K, Raghavendra CS. Sprayand wait: an efficient routing scheme for intermittentlyconnected mobile networks. In Proceedings of the2005 ACM SIGCOMM Workshop on Delay-tolerantNetworking. ACM: New York, NY, 2005.

35. Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering theoverlapping community structure of complex networksin nature and society. Nature 2005; 435: 814–818.

36. Jun J, Peddabachagari P, Sichitiu ML. Theoreticalmaximum throughput of IEEE 802.11 and its appli-cation. 2nd IEEE International Symposium on Net-work Computing and Applications, 16–18 April 2003;249–256.

37. Ramadas SBM, Farrell S. Licklider transmission pro-tocol specification. RFC 5326, Internet EngineeringTask Force (IETF), September 2008.

38. Wang R, Burleigh S, Parik P, Lin C-J, Sun B. Lick-lider transmission protocol (LTP)-based DTN for cis-lunar communications. IEEE/ACM Transactions onNetworking 2011; 19(2): 359–368.



AUTHORS’ BIOGRAPHIES

Shabbir Ahmed received his B.Sc.degree in Applied Physics & Elec-tronics from the University of Dhaka,Dhaka, Bangladesh in 1992; M.Sc.degree in Computer Science from thesame university in 1993, and Ph.Ddegree in Computer Science fromthe University of New South Wales,Sydney, Australia in 2010. He is an

Associate professor of Computer Science and Engineeringat the University of Dhaka. His current research interestincludes vehicular networks and data mining.

Salil S. Kanhere received the B.E.degree in electrical engineering fromthe University of Bombay, Bombay,India in 1998 and the M.S. andPh.D. degrees in electrical engi-neering from Drexel University,Philadelphia, U.S.A. in 2001 and2003, respectively. He is currently asenior lecturer with the School of

Computer Science and Engineering at the University ofNew South Wales, Sydney, Australia. His current researchinterests include wireless sensor networks, vehicular com-munication, mobile computing, and network security. Heis a member of the IEEE and the ACM.


hubcode: hub-based forwarding using network coding …salilk/papers/journals/wcmc2011b.pdf · the...

Documents