ieee/acm transactions on networking 1 cooperative data ofﬂoad in opportunistic ... ·...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE/ACM TRANSACTIONS ON NETWORKING 1

Cooperative Data Offload in OpportunisticNetworks: From Mobile Devices

to InfrastructureZongqing Lu, Member, IEEE, Xiao Sun, and Thomas La Porta, Fellow, IEEE

Abstract— Opportunistic mobile networks consisting of inter-mittently connected mobile devices have been exploited for vari-ous applications, such as computational offloading and mitigatingcellular traffic load. In contrast to existing work, in this paper,we focus on cooperatively offloading data among mobile devicesto maximally improve the probability of data delivery from amobile device to intermittently connected infrastructure withina given time constraint, which is referred to as the cooperativeoffloading problem. Unfortunately, the estimation of data deliveryprobability over an opportunistic path is difficult and cooperativeoffloading is NP-hard. To this end, we first propose a probabilisticframework that provides the estimation of such probability.Based on the proposed probabilistic framework, we design aheuristic algorithm to solve cooperative offloading at a low com-putation cost. Due to the lack of global information, a distributedalgorithm is further proposed. The performance of the proposedapproaches is evaluated based on both synthetic networks andreal traces. Experimental results show that the probabilisticframework can accurately estimate the data delivery probability,cooperative offloading greatly improves the delivery probability,the heuristic algorithm approximates the optimum, and theperformance of both the heuristic algorithm and distributedalgorithm outperforms other approaches.

Index Terms— Opportunistic mobile networks, cooperativedata offload.

I. INTRODUCTION

OPPORTUNISTIC mobile networks consist of mobiledevices which are equipped with short-range radios

(e.g. Bluetooth, WiFi). Mobile devices intermittently contacteach other without the support of infrastructure when theyare within range of each other. Most research work onsuch networks focuses on data forwarding [9], [30] and datacaching [11], [40]. Other research work focuses on explor-ing opportunistic mobile networks for various applications.Shi et al. [29] proposed to enable remote computing amongmobile devices so as to speedup computing and conserveenergy. Liu et al. [20] explored the practical potential ofopportunistic networks among smartphones. Lu et al. [21] built

Manuscript received February 10, 2016; revised October 19, 2016 andMay 4, 2017; accepted August 3, 2017; approved by IEEE/ACM TRANS-ACTIONS ON NETWORKING Editor J. Kuri. This work was supported in partby the Network Science CTA under Grant W911NF-09-2-0053 and in part byCERDEC under Contract W911NF-09-D-0006. A early version of this workappeared in the Proceedings of IEEE INFOCOM 2016 [22]. (Correspondingauthor: Zongqing Lu.)

Z. Lu is with the Department of Computer Science, Peking University,Beijing 100871, China (e-mail: [email protected]).

X. Sun and T. La Porta are with the Department of Computer Science andEngineering, Pennsylvania State University, University Park, PA 16802 USA(e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TNET.2017.2747621

a system to network smartphones opportunistically usingWiFi for providing communications in disaster recovery.Han et al. [12] proposed to migrate the traffic from cellular net-works to opportunistic communications between smartphones,coping with the explosive traffic demands and limited capacityprovided by cellular networks.

In this paper, we focus on exploiting opportunistic com-munications for offloading data which is to be transmittedfrom a mobile device to intermittently connected infrastructure(e.g., a remote server). Specifically, besides directly transmit-ting data to infrastructure, the mobile device can also transferdata to other devices and then let these devices transmitthe data to infrastructure so as to improve the probabilityof successful data delivery. Unlike mobile cloud computing,which assumes mobile devices are always connected to thecloud via cellular networks and the decision of offloading ismainly based on energy consumption or application executiontime [4], [5], [34], [37], we consider the data offloading sce-nario where mobile devices only have intermittent connectionswith infrastructure, and mobile devices collaborate to transmitdata to infrastructure.

There are several application scenarios. For example, invehicular ad hoc networks, instead of deploying more roadsideunits to increase the coverage, a vehicle can seek help fromother vehicles for transmitting data to roadside units when itwill not meet roadside units on its current route or it cannotcompletely transfer the data to roadside units due to theirtransient contact. Therefore, such a cooperative data offloadingscheme that takes advantage of the communication capabilityof other vehicles can potentially reduce the cost of buildingmore roadside units, while maintaining the desired coverage.In disaster recovery, due to the very limited coverage andbandwidth of deployed mobile cellular towers, data can befragmented and sent to multiple users for a better reachabilityto the mobile cellular towers.

In opportunistic mobile networks, data replication [30] anddata redundancy [14] are normally employed to improvethe probability of successful data delivery by increasing theamount of transmitted data. However, both techniques lead tohigh bandwidth overhead, which can be severe for large datatransfer and costly for cellular networks. Unlike existing work,we focus on offloading data segments to other mobile devicesso as to improve the probability of successful data delivery.We rely on an important observation that the probability ofsuccessful data delivery over the opportunistic path dropsdramatically when the size of transmitted data is large. In suchscenarios, offloading data segments can yield a better delivery

1063-6692 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


2 IEEE/ACM TRANSACTIONS ON NETWORKING

probability than data replication or data redundancy, in addi-tion to not incurring additional bandwidth overhead. However,maximally improving the delivery probability by data offload-ing, which is referred to as the cooperative offloading problem,turns out to be non-trivial and NP-hard. Although cooperativeoffloading admits PTAS (Polynomial Time ApproximationScheme), PTAS is still not affordable for mobile devices due toits incurred computation overhead of data delivery probability.

To deal with this, we design a centralized heuristic algorithmbased on the proposed probabilistic framework to solve coop-erative offloading. To cope with the lack of global information,we further propose a distributed algorithm. Through extensivesimulations on synthetic networks and experiments on realtraces, we demonstrate that cooperative offloading can greatlyimprove the data delivery probability, the heuristic algorithmapproximates the optimum, and the performance of boththe centralized heuristic algorithm and distributed algorithmoutperforms other approaches. The main contributions of thispaper can be summarized as follows.

• We propose a probabilistic framework that provides theestimation of data delivery probability over the oppor-tunistic path, considering both data size and contactduration. To the best of our knowledge, this is the firstwork that gives such estimation without any restrictions.

• We design a heuristic algorithm that performs pathallocation and data assignment by carefully consideringopportunistic contact probability and path capacity. Thealgorithm approximates the optimum at a much lowercomputational cost than PTAS.

• We design a distributed algorithm which employs cri-terion assignment, real-time adjustment and assignmentupdate to make data offloading decisions at runtime.Although it only exploits paths no more than two hopsfor data forwarding, its performance is comparable to thatof the heuristic algorithm.

The rest of this paper is organized as follows. Section IIreviews related work and Section III gives the overview. Theprobabilistic framework is presented in Section IV, followedby the heuristic algorithm and the distributed algorithm inSection V and Section VI, respectively. Section VII evaluatesthe performance of the proposed approaches and Section VIIIconcludes the paper.

II. RELATED WORK

Mobile opportunistic networks have been studied mainlyfor data forwarding, where mobile nodes carry and forwardmessages upon intermittent node contacts. The key problem ishow to select appropriate relays such that messages can be for-warded to destinations quickly considering forwarding costs.Many forwarding schemes have been proposed from differentperspectives, such as representative strategies including [8],[9], [31], contact capability based schemes including [35], andsocial concepts based schemes including [6], [13], [23], [24],[26], [38]. In addition, forwarding is addressed as a resourceallocation problem in [2] and forwarding capability of nodesunder energy constraints is investigated in [3]. However, mostresearch work on forwarding in mobile opportunistic networksis based on an assumption that nodes can completely transfer adata item during a node contact. Nevertheless, this assumption

is not valid in some cases, for example, when node contactsare transient (e.g., in vehicular ad hoc networks) or when dataitems are large.

Unlike the existing work, in this paper, we relax thisassumption and take contact duration into consideration. Theselection of forwarding path(s) for a data item is determinedbased on the size of the data item; i.e., we choose thepath(s) with the maximum probability of successful deliv-ery of the data item by incorporating data fragmentation(i.e., the data item can be fragmented and the segments canbe forwarded over different paths). Note that social relations(e.g., community) that are exploited to characterize node con-tacts cannot provide the estimation of data delivery probability.

The explosive growth of mobile devices (such as smart-phones and tablets) has stimulated research on exploringmobile opportunistic networks for various applications. Theseapplications can be generally classified into three categories:mitigating cellular traffic load, offloading computational tasksand information dissemination. The potential of mobile oppor-tunistic networks in terms of reducing cellular traffic load wasinvestigated in [20]. The incentive mechanism was studiedin [39] and some solutions were proposed in [12], [27],[28], and [33]. Computational offloading among intermittentlyconnected mobile devices was investigated in [17] and [29].Information dissemination in mobile opportunistic networkswas considered based on mobility and community in [32]and [25], respectively. Unlike the existing work, in thispaper, we investigate application scenarios in which mobiledevices cooperatively transmit data to intermittently connectedinfrastructure so as to improve the probability of data delivery.

III. OVERVIEW

A. Network Model

Consider a network, where mobile nodes opportunisticallycontact each other and each node may intermittently connectwith infrastructure. Then, mobile nodes and infrastructuretogether form an opportunistic mobile network. To simplifythe notation and the presentation of the paper, infrastructureis seen as a special node, and intermittent connections betweennodes and infrastructure are seen as node contacts. In thenetwork, for two nodes, an edge exists between them onlyif they opportunistically contact each other. With regard todata offloading, each node can exploit other nodes for datatransmission. Specifically, in addition to transmitting a dataitem to the infrastructure along one particular path, the nodecan also select multiple paths to the infrastructure, and eachpath can be used to carry part of the data item.

B. Basic Idea

In an opportunistic mobile network, given the pattern ofcontact frequency and contact duration between nodes, wefirst investigate the relation between data size and successfuldelivery probability for given path and time constraint. Intu-itively, since the size of data that can be transmitted during anode contact is restricted by the contact duration, data withsmall size can be easily transmitted over the opportunisticpath; data with large size may require multiple node contactsand thus not be completely transmitted. As shown in Figure 1,


LU et al.: COOPERATIVE DATA OFFLOAD IN OPPORTUNISTIC NETWORKS: FROM MOBILE DEVICES TO INFRASTRUCTURE 3

Fig. 1. Relation between data size and successful delivery probability for agiven pair of path and time constraint.

Fig. 2. A simple scenario where node u and v are connected by two paths.Table shows the delivery probabilities of different data sizes over the paths,taken from Fig. 1.

which is obtained for a given pair of path and time constraint(taken from the experiments in Section VII), the probabilityof successful delivery exponentially decreases when the datasize increases. In other words, the delivery probability canexponentially increase if less data is transmitted over theopportunistic path. Based on this important observation, it isconcluded that a node may obtain a better delivery probabilityof a data item if it selects multiple paths to the destination andeach path carries a part of the data item.

To better elaborate on this motivation, let us consider asimple example as shown in Figure 2, where node u needsto send a data item S (we abuse the notation a little and letS also denote the size of the data item) to v. Let us usethe real delivery probabilities over an opportunistic path aslabeled in Figure 1, and S = 20. Assume paths i and j havethe same delivery probability for S and S/2. If u transmitsS to v over path i or j, the delivery probability is 0.23.If each path carries a replication of S, the probability is∑2

n=1

(2n

)0.23n0.772−n = 0.407, considering all failures are

Bernoulli. Instead, if u sends S/2 along each path, the deliveryprobability is 0.71 × 0.71 = 0.504, which is more than twotimes of the probability when S is sent over individual path.

It may be surprising that sending data segments alongmultiple paths can achieve a better delivery probability,even better than sending replications along multiple paths.Although increasingly sending replications over multiple pathsmay eventually yield better performance, this approach isrestricted by available paths and incurs a cost of multipliednetwork resources. In this example, sending S over four pathsbetween u and v will result in the delivery probability of∑4

n=1

(4n

)0.23n0.774−n > 0.504, but at the cost of four times

of network traffic and thus bandwidth overhead, not to men-tion requiring four paths between the source and destination.Moreover, applying data redundancy (e.g., erasure coding) alsoincurs r (replication factor) times of network resources andmay or may not be beneficial [14].

Therefore, in this paper, instead of data replication or dataredundancy, we focus on maximally improving the proba-bility of data delivery by sending data segments throughmultiple paths (i.e., offloading data segments to other nodes),

which leads to no additional network overhead and betterperformance; i.e., we choose the path(s) that has or have themaximum probability of data delivery, considering sendingdata fragments via multiple paths.

C. Problem StatementSuppose node u needs to transmit a data item S to node v

within time constraint T . The cooperative offloading problemis defined as to maximize the probability of successful deliveryof S within T by offloading data among opportunisticallyconnected nodes, where S can be fragmented into differentsegments and each segment is transmitted over different path(i.e., each path at most carries one segment). Then, cooperativeoffloading can be mathematically formulated as

maxm∏

i=1

n∏

j=1

P (Si, j)xi,j

s.t.m∑

i=1

Si = S, m ∈ {1, 2, · · · , n},

n∑

j=1

xi,j = 1,m∑

i=1

xi,j ∈ {0, 1}, xi,j ∈ {0, 1}, (1)

where n is the number of paths between u and v, m isthe number of data segments, P (Si, j) is the probabilitythat node u can transmit segment i with size Si to node vwithin T along path j, xi,j = 1 if segment i is assigned topath j, otherwise xi,j = 0. Moreover,

∑nj=1 xi,j = 1 ensures

that each segment is assigned once, and∑m

i=1 xi,j ∈ {0, 1}ensures that each path carries at most one segment. Thus, themain problem of cooperative offloading is determining how tofragment the data item (including the number of segments andthe size of each segment) and on which path to transmit eachsegment so as to maximize the successful delivery probabilityof the entire data item before its deadline. Thus, it is a jointoptimization of the number of data segments, the size of eachsegment, and the selection of path for each segment. Althoughthe formulation of cooperative offloading is straightforward, itturns out to be NP-hard.

D. NP-HardnessThe NP-hardness of cooperative offloading can be proven by

reduction to the minimization knapsack problem. In coopera-tive offloading, maximizing

∏mi=1

∏nj=1 P (Si, j)xi,j is equiv-

alent to minimizingm∑

i=1

n∑

j=1

− log P (Si, j)xi,j ,

where − logP (Si, j) > 0 since 0 < P (Si, j) < 1. Fora pair of segment i and path j, it has a positive value of− logP (Si, j) and a size Si. Therefore, cooperative offloadingcan be rewritten as:

minmn∑

k=1

vkxk

s.t.mn∑

k=1

skxk ≥ S, xk ∈ {0, 1},



where vk and sk respectively correspond the positive valueand the size of a pair of segment and path. This is theminimization knapsack problem, which is NP-hard, and thuscooperative offloading is also NP-hard. Although cooperativeoffloading also admits PTAS (Polynomial Time Approxima-tion Scheme) as the minimization knapsack problem, PTAS forcooperative offloading requires the knowledge of the value (theprobability) for any pair of arbitrary data segment and path.That is, assuming the size of a data item is S, it needsto iteratively calculate the probability of any data segmentsmaller than or equal to S for every path between source anddestination. However, in opportunistic mobile networks, theestimation of such probability requires high computation aswe will discuss in Section IV. Since all the computation needsto be performed on the source node (i.e., a mobile device) thathas a limited computation capability, PTAS is not affordablefor mobile devices and thus we design better algorithms.

IV. PROBABILISTIC FRAMEWORK

In opportunistic mobile networks, even the estimation ofthe delivery probability of a data item along a particularpath is hard. Thus, in this section, we propose a probabilisticframework to estimate such probability based on node contactpattern. The existing work [14], [40] simplifies the estimationby considering only one node contact along an opportunisticpath before the deadline and thus severely underestimates thedata delivery probability. To cope with this, we consider anarbitrary number of node contacts, and thereby our frameworkcan accurately estimate the probability.

In opportunistic mobile networks, contact patterns betweennodes have been well analyzed. Similar to [2] and [10],we model the contact process between each pair of nodesas the independent Poisson process. This modeling has beenexperimentally validated in [10]. Moreover, unlike [18], [19]that assume only a fixed amount of data can be transferredduring a node contact, or [10], [35] that assume an arbitrarysize of data can be sent during a node contact, we do not makesuch assumptions; i.e. the data amount that can be transmittedduring a node contact depends on the contact duration asin [14] and [40]. In [14] and [40], the data amount that canbe transmitted between nodes is estimated based on singlenode contact. However, given a time constraint, nodes maycontact each other multiple times and thus such an estimationis not technically sound. Therefore, to accurately estimatethe delivery probability, we consider the data amount that isbrought by any arbitrary number of contacts between nodeswithin a time constraint. Moreover, similar to [40], we modelcontact duration between each pair of nodes as the Paretodistribution (based on per node pair statistics). Note thatthe distribution of contact frequency and contact duration isheterogeneous across different pairs of nodes; i.e., each pairof nodes has specific parameters for its distributions.

In the following, we first introduce opportunistic contactprobability and data transfer probability, and then based onthese, we show how to calculate data delivery probability.

A. Opportunistic Contact ProbabilitySince node contacts are independent Poisson processes, let

random variable T1 represent inter-contact duration between

node u and v, which follows an Exponential distribution withrate parameter λ1, i.e. T1 ∼ Exp(λ1), where λ1 is the rateparameter of the Poisson process between u and v. Similarly,we have T2 ∼ Exp(λ2) for node v and w. Then, the availableprobability (denoted as Q) of the opportunistic path from u tow via v (i.e., u contacts v first and then v contacts w) withinT can be represented as Q = P (T ≤ T ), where T = T1 +T2.Note that the sequence of node contact is already captured byP (T ≤ T ). Assuming the probability density functions (PDF)of T1 and T2 are f1(t) and f2(t), respectively, P (T ≤ T ) canbe calculated through the convolution f1(t)⊗ f2(t). However,a data item may not be completely transmitted during onecontact between neighboring nodes, may but require multi-ple contacts. Thus, multiple contacts at each hop should beconsidered.

Assume a data item can be transferred from u to w bythe number of contacts a between u and w, and from wto v by the number of contacts b between w and v. For acollection of independent and identically distributed (i.i.d.)random variables {Ti, i = 1, . . . , a} and Ti ∼ Exp(λ1),we have T1 ∼ Gamma(a, λ1), where T1 =

∑ai=1 Ti. Sim-

ilarly, for {Tj, j = 1, . . . , b} and Tj ∼ Exp(λ2), we haveT2 ∼ Gamma(b, λ2), where T2 =

∑bj=1 Tj . Then the PDF of

T = T1 + T2 can be represented as

f(t) = f(t; a, λ1)⊗ f(t; b, λ2)

=ta−1e−tλ1

λ−a1 Γ(a)

⊗ tb−1e−tλ2

λ−b2 Γ(b)

.

For a k-hop opportunistic path with the rate parameter λi

and the contact number ni at each hop i (we also call it k-hopopportunistic contact path), the PDF of T = T1 + · · ·+Tk canbe easily generalized as

f(t) = f(t; n1, λ1)⊗ f(t; n2, λ2)⊗ · · ·⊗ f(t; nk, λk)

=tn1−1e−tλ1

λ−n11 Γ(n1)

⊗ tn2−1e−tλ2

λ−n22 Γ(n2)

⊗ · · ·⊗ tnk−1e−tλk

λ−nkk Γ(nk)

. (2)

However, it is hard to calculate (2) due to the higher orderconvolution. To simplify the calculation, we have the followingby extending the Welch-Satterthwaite approximation to thesum of gamma random variables

f(t) ≈ tγ−1e−tδ

δ−γΓ(γ), (3)

where

γ =

(∑ki=1 niλi

)2

∑ki=1 niλ2

i

and δ =∑k

i=1 niλ2i∑k

i=1 niλi

.

Thus, T can be approximated by a single gamma randomvariable T̃ ∼ Gamma(γ, δ), where T̃ and T are proved to havethe same mean and variance. Then, the opportunistic contactprobability with time constraint T , denoted as P (T ≤ T ), canbe easily calculated.

It is shown in [15] that in some mobile traces the expo-nential distribution of inter-contact duration between nodesmay feature a head that likely follows a power law. Thisdichotomy is separated by a characteristic time [15], denotedas Tc. When the time constraint T is longer than or equal



to Tc, the probability sits on the tail of the distribution, whichis exponential, and thus it is accurate. When T is shorterthan Tc, the probability sits on the head of the distribution,and thus it may be overestimated. The rigorous mathematicalanalysis on the mobile traces featured with the dichotomy isout of the scope of this paper.

B. Data Transfer Probability

As the contact duration is modeled as the Pareto distribution,it can be easily concluded that the amount of data that canbe transferred during a contact between a pair of nodes alsofollows a Pareto distribution since the data rate between apair of nodes is relatively stable as shown in [40]. Let randomvariable D1 represent the amount of data transmitted duringa contact between node u and v, which follows the Paretodistribution with shape parameter α and scale parameter β(β is the minimum possible value of D1), i.e. D1 ∼Pareto(α, β). For a collection of i.i.d. random variables{Di, i = 1, . . . , c} and Di ∼ Pareto(α, β), let D =

∑ci=1 Di.

Then, the probability that a data item with size D can betransferred with the number of contacts c between node u andv is represented as P (D ≥ D) and can be derived from thePDF of D. However, since D is the sum of an arbitrary numberof random variables, its PDF cannot be easily approximated bystable distributions. Therefore, we choose to approximate Dby the maximum value of {Di, i = 1, . . . , c} denoted as M,which relies on the observation that M has the same orderof magnitude as the sum D since the Pareto distribution is aheavy-tailed distribution.

To capture such intuition, let us define R = D/M. Then,D can be approximated by M, considering the ratio R. Then,P (D ≥ D) can be approximated as

P (D ≥ D) ≈ 1−(

1− (βR̄

D)α

)c

, (4)

where R̄ is the expectation of R (see Appendix VIII for thederivation). Approximating D with M bears a low relativeerror about 10% [36].

C. Data Delivery Probability

Given an opportunistic path, a data item D and a timeconstraint T , we will show how to calculate the probability ofsuccessful delivery of D over the opportunistic path within T ,denoted as P (T, D), based on the opportunistic contact prob-ability and data transfer probability.

First, we consider the case of a one-hop path. The dataitem needs at most l = ⌈D

β ⌉ node contacts to be completelytransferred. Then, P (T, D) can be calculated as the sum ofthe probabilities that the data item is completely transmittedat each particular contact as following

P (T, D) =l∑

i=1

P̃i−1 · P (Ti ≤ T − T ′) · P (Di ≥ D), (5)

where

P̃i =

⎧⎪⎨

⎪⎩

i∏j=1

P (Tj ≤ T − T ′) · P (Dj < D) i > 0

1 i = 0,

T ′ = Dr denotes the transmission time of the data item and

r is the data rate between these two nodes. As in (5), theprobability that ith contact completes the data transfer can beinterpreted as the probability that the data can be transferredwithin i node contacts when i − 1 contact fails (i.e. the datatransferred by former i − 1 contacts is less than D and thesum of inter-contact duration of i contacts is less or equalto T − T ′).

Then, we can extend the calculation of P (T, D) to a k-hoppath. Given a k-tuple

⟨n1, . . . , ni, . . . , nk⟩, 1 ≤ ni ≤ ⌈D

βi⌉, (6)

which falls into one of the∏k

i=1⌈Dβi⌉ possible combinations,

we need to calculate the probability that the data can be trans-ferred through the path by the specified number of contacts ateach hop as ni in the k-tuple. Similar to the calculation forthe one-hop path, we need to consider the failure probabilityof tuples which have one less contact number at one of thehops. For example, for 2-tuple ⟨n1, n2⟩, we need consider thefailure probability of both ⟨n1−1, n2⟩ and ⟨n1, n2−1⟩. Finally,P (T, D) on the k-hop path can be calculated as the sum ofthe probabilities of all the combinations, and the computationalcomplexity is

∏ki=1⌈D

βi⌉. β depends on the contact duration

and data transmission rate between nodes, and thus it variesin different application scenarios. The complexity depends onnot only β but also D. Given a network, β between nodepairs is in the same scale based our analysis on real traces(e.g., in MIT Reality trace [7], the minimum contact durationof 99% node pairs is between 5 and 50 minutes) and hence Ddetermines the complexity. Moreover, D should not be muchlarger than β. Given a typical value of α, e.g., 2 as in MITReality trace, it approximately needs D/2β contacts to transferD between a pair of nodes. Assuming D and β are in differentscales, e.g., D = 10GB and β = 1MB, we do not expectany applications can tolerate a data transfer to be completedby 5000 opportunistic contacts. Therefore, the computationalcomplexity is low in practice.

Note that we do not consider the case where an edge canbe shared by multiple paths, since this will complicate theproblem and make it impossible to calculate the data deliveryprobability.

In summary, the probabilistic framework explores contactpattern between nodes to provide the estimation of datadelivery probability over the opportunistic path, which is thebasis of our designed algorithms. To the best of our knowledge,this is the first work that provides such estimation without anyrestrictions.

V. HEURISTIC ALGORITHM

In this section, based on the proposed probabilistic frame-work, we give the design of the heuristic algorithm.

A. The AlgorithmAccording to (1), intuitively, a good solution of cooperative

offloading should have a small number of data segmentswhile keeping a high delivery probability for each segment.Therefore, the basic idea of the heuristic algorithm is to firstfind some paths between the source and destination that have



a higher available probability than the direct path. Then, it isdetermined how much data should be assigned at each ofthese paths, while maintaining a high data delivery probability.Finally, data reallocation among these paths is performed tomaximally improve the delivery probability of the entire dataitem.

To choose particular paths for data offloading, we need tomeasure the capability of paths for data transmission. Sinceall paths are opportunistic, which are characterized by contactfrequency and contact duration at each hop, we can explorethese two properties to measure the capability of each path.However, without prior knowledge of how much data will beassigned at each path, it is difficult to quantify the capabilityof paths. Therefore, we employ the available probability ofthe opportunistic path within the time constraint Q as themetric to characterize each path, since the establishment of theopportunistic path within the time constraint is the prerequisiteof data transmission between the pair of nodes. Moreover,the available probability of direct path (one hop path betweensource and destination), denoted as Q′, is used as the criterionto choose these paths.

After allocating the paths, we need to determine how muchdata should be assigned to each path initially. When nodescontact each other, β is the size of data that can be guaranteedto be transferred (according to the Pareto distribution). Thus,for each path, we allocate the amount of data that can beguaranteed to be transferred along the path if the path canbe established with the time constraint, i.e. min{βi, i =1, · · · , k}. We call this the path capacity, denoted as Ci

uv forpath i between node u and v. Note that the delivery probabilityof initially assigned data at a selected path is the same with theavailable probability Q of the path, so no additional calculationis needed.

Considering an example where node u needs to transmit adata item S to node v within T , the heuristic algorithm worksas follows. First, we adopt Dijkstra’s algorithm with the metricof Q to find the path from node u to v that has the minimum1/Q. Note that in Dijkstra’s algorithm Q is calculated for thepath between node u and unvisited node according to (3). If thefound path i has Qi ≥ Q′ (if node u does not have a directpath to v, Q′ = 0), then we assign the data amount of Ci

uv topath i, i.e., Si = Ci

uv if Ciuv < S −

∑i∈PA

Ciuv , otherwise

Si = S−∑

i∈PACi

uv , and add i into the set of allocated pathsPA. After allocating path i, the edges along the path will beremoved from the network and not considered in subsequentsearches.

The searching process is iteratively executed until itmeets one of the following stop conditions: (i) Qi < Q′;(ii) node v cannot be reached by Dijkstra’s algorithm fromu; (iii)

∑i∈PA

Si = S. If the searching process ends with|PA| = 0, there is no need to offload and it is better to sendthe data item directly from u to v. If

∑i∈PA

Si < S, theremaining data needs to be assigned to PA. If

∑i∈PA

Si = S,data reallocation among PA is needed to improve the deliveryprobability of S. The path searching process is from line 1 to 9in Algorithm 1

To assign the remaining data of S −∑

i∈PACi

uv to PA,first we rank PA according to the delivery probability of theassigned data at each path, i.e. P i

uv(T, Si), i ∈ PA. Then,

Algorithm 1 Heuristic AlgorithmInput : u, v, S, TOutput: PA, S

1 while DijkstraMaxQ(u, v) ̸= ∅ &&∑

i∈PASi < S

do2 p← DijkstraMaxQ(u, v)3 if Qp < Q′ then4 break5 end6 ExcludeAllocPath(p)7 PA ← PA ∪ {p}8 S ← S ∪ {Sp}9 end

10 while S −∑

i∈PASi > 0 do // assign remaining data

11 p← arg maxi∈PA P iuv(T, Si)

12 q ← arg maxi∈PA\{p} P iuv(T, Si)

13 while P puv(T, Sp) ≥ P q

uv(T, Sq) &&S −

∑i∈PA

Si > 0 do14 Sp ← Sp + ∆ // ∆ is increment of assigned data

15 end16 end17 while do // reallocate data

18 P ′A ← PA, S′ ← S

19 j ← arg mini∈P′A

P iuv(T, Si)

20 P ′A ← P ′

A\{j}, S′ ← S′\{Sj}21 while Sj > 0 do22 p← arg maxi∈P′

AP i

uv(T, Si)23 q ← argmaxi∈P′

A\{p} P iuv(T, Si)

24 while P puv(T, Sp) ≥ P q

uv(T, Sp) && Sj > 0 do25 Sp ← Sp + ∆26 Sj ← Sj −∆27 end28 end29 if

∏i∈PA

P iuv(T, Si) <

∏i∈P′

AP i

uv(T, Si) then30 PA ← P ′

A, S ← S′

31 else32 break33 end34 end

we assign more data to the path with the highest deliveryprobability among PA until the probability is lower than thesecond highest one. The process is repeated until there is noremaining data. When assigning more data to the selectedallocated path, we need to determine the amount of data tobe assigned each time. As the path capacity of a k-hop pathis min{β1, β2, . . . , βk}, each time we increase the amount ofthe assigned data to next higher value among {β1, β2, . . . , βk}.If the assigned data is more or equal to max{β1, β2, . . . , βk},each time the assigned data is increased by the path capacity.The assignment of the remaining data is from line 10 to 16in Algorithm 1

After all the remaining data is assigned, we further refine thedelivery probability of S by reallocating the data assigned atthe path, say path i, which has the lowest delivery probabilityamong PA, to other paths in PA using the same approach



described above. If the reallocation of Si improves the deliveryprobability of S, path i is excluded from PA and the processis repeated, otherwise the reallocation process stops (from line29 to 33 in Algorithm 1). The heuristic algorithm ends up withthe allocated paths and the data assignment at each path.

B. DiscussionThe heuristic algorithm runs on mobile devices. When each

mobile device connects to the infrastructure, it uploads thecontact information with other nodes to the infrastructuresuch that the infrastructure has the global information of thenetwork. Then, the infrastructure sends out the up-to-dateglobal information each time a node is connected to it suchthat each node has the global information. When a node needsto transmit data to the infrastructure, it can run the heuristicalgorithm based on the global information it currently has todetermine how to cooperatively offload the data.

The global information may reveal users’ contacts andintrude on their privacy. However, a user’s identity is hiddenbehind the node id. Without correlating a user with its node id,the user’s privacy is secure. More sophisticated and rigoroustreatment of privacy control is out of the scope of this paper.

The contact information between a node pair is representedby three parameters, i.e., α, β, and λ. Let dm denote themaximum node degree and N be the number of nodes in thenetwork. The size of exchanged contact information during acontact between a node and infrastructure is at most 3dmN .Since the maximum node degree dm commonly does notincrease with the network size N , the overhead of exchangedcontact information linearly scales with the network size.

In the algorithm, we exclude the allocated paths from thesubsequent path searches, because it is extremely hard to cal-culate the data delivery probability if multiple data segmentsare transmitted along an edge which is shared by multipleallocated paths. Therefore, we choose to avoid the difficulty byregulating an edge to be occupied by only one allocated path.Moreover, this also limits the maximum number of allocatedpaths to the minimum node degree (number of neighbors)of source and destination, and thus path allocation usingDijkstra’s algorithm can terminate quickly.

Intuitively, multi-hop paths should be considered for datatransmission when its available probability is higher than aone-hop path. Therefore, Q′ is employed as a criterion toallocate paths. As indicated in Section IV, the computationalcomplexity of data delivery probability over multi-hop pathsincreases with the number of hops. However, due to theadoption of Q′, allocated paths are usually limited to pathswith fewer hops and thus we can avoid the high computationalcomplexity of delivery probability when data that is more thanthe path capacity is assigned to a multi-hop path.

Data assignment at each allocated path is initially set to thepath capacity, and thus the data delivery probability at eachpath is the same as Q. This is a good choice because wedo not know which paths are sensitive (in terms of deliveryprobability) to the increase of assigned data beforehand. Afterpath allocation and initial data assignment, the number of datasegments is the same as that of allocated paths. Then, data isreallocated from paths of lower delivery probability to onesof higher probability so as to improve the delivery probability

of the data item by reducing the number of allocated paths.This complies with the design intuitions that the number ofdata segments should be small and each allocated path shouldhave a high delivery probability.

Moreover, the computational complexity of the heuristicalgorithm is much lower than PTAS for cooperative offloading.PTAS needs to search all possible paths (they can be verylong) between source and destination; the heuristic algorithmonly needs to locate the paths, the available probability ofwhich is more than Q′. In addition, PTAS has to calculate thedelivery probability of each path when carrying all possibleamounts of data. This incurs the most of the computation.Unlike PTAS, the heuristic algorithm only needs to calculatecertain amounts of data for the few allocated paths. Due tothe uncertainty of the number of paths, path length and pathcapacity, it is hard to give the mathematical comparison of thecomputational complexity between the heuristic algorithm andPTAS. However, based on the reasoning stated above, it is easyto conclude that the computation of the heuristic algorithm ismuch less that of PTAS, and its performance is close to theoptimum as we will demonstrate in Section VII.

For the heuristic algorithm itself, excluding the complexityof calculating the data delivery probability, the complexityincludes three parts. (i) The complexity of allocating pathsbetween a pair of source and destination is dmN2. Since aselected path is excluded from the network, at most dm pathsare allocated. Dijkstra’s algorithm costs N2 each run and thustotally dmN2. (ii) The complexity of assigning remaining datato selected paths is S2, where S is the size of the data item,since the remaining data is at most S and the increment of dataassignment is at least 1. (iii) The complexity of reallocatingdata among selected paths is dmS2, because there are atmost dm reallocation processes and each costs at most S2.So, the combined complexity is dmN2+S2+dmS2. Since themaximum node degree dm commonly does not increase withthe network size N , the complexity of the heuristic algorithmis O(N2 + S2).

Based on the characteristics of the heuristic algorithm,we expect the use cases of the heuristic algorithm, in general,are the scenarios where contact pattern between nodes isrelatively stable and communication overhead is small, whichmeans the time taken to exchange contact information betweeninfrastructure and node is relatively short compared to theircontact duration by jointly considering bandwidth and networksize. If contact pattern between nodes varies largely, theinformation collected by the infrastructure may be stale, whichmay affect the performance. If the network size is too large,the communication overhead may dominate and little datacan be transmitted from a node to the infrastructure duringa contact. An example use case is disaster recovery, whererescue crews and survivors can better communicate with thecommand center using their smartphones via limited mobilecellular tower [21] by cooperative data offloading. The globalinformation can be stored at the command center (directlyconnected with mobile cellular towers), which can be trustedin this scenario. Moreover, the heuristic algorithm is readyand easy to deploy on the delay-tolerant network architectureDTN2 [1] and the smartphone-based system [21] for disasterrecovery.



Fig. 3. Paths constructed based on local information. (a) paths between nodeu and v based on the information at node u. (b) paths between node u andv based on the information at node u and c.

VI. DISTRIBUTED ALGORITHM

Global information is required for the heuristic algorithmpresented above. However, in some scenarios such informa-tion might not be available or cost too much to collect.Furthermore, the pattern of contact frequency and contactduration between nodes may vary largely over time and thusthe collected information may be stale, which will impact theperformance of the heuristic algorithm. Thus, in this sectionwe propose a distributed algorithm to address these problems.

In opportunistic mobile networks, it is hard to maintainmulti-hop information locally, since nodes only intermittentlycontact each other and thus the information cannot be promptlyupdated. Moreover, from the probabilistic framework, we caninfer that shorter paths are prone to have a higher data deliveryprobability, and thus maintaining multi-hop information canalso be wasteful. Therefore, in the design of the distrib-uted algorithm, it is only required that each node maintainstwo-hop information of contact frequency and contact durationbetween nodes; i.e., each node maintains contact frequencyand contact duration with its neighbors, and when two nodesencounter, they will exchange such information and then theywill have the two-hop information. The goal of the distributedalgorithm is to determine whether and how much data shouldbe transmitted when the node carrying the data encountersother nodes based on the collected information so as toimprove the delivery probability.

With the two-hop information maintained locally, the sourcenode can initially construct paths to the destination (infrastruc-ture). For example, as shown in Figure 3a, there are four pathsthat can be constructed from node u to node v including oneone-hop path and three two-hop paths, where node a, b andc are the potential nodes to cooperate for data transmission.Moreover, when the source node encounters a neighbor, it alsohas the local information maintained at this neighbor. Forexample, as shown in Figure 3b, when node u encountersnode c, node u learns the information of the two-hop pathsbetween c and v, and then node u can construct more pathsto node v.

The distributed algorithm exploits the locally maintainedinformation for data offloading, which includes three phases:criterion assignment, real-time adjustment and assignmentupdate. The first phase leverages the local information to selectthe paths for offloading and determines the data assignment foreach path; the second phase adjusts the data assignment basedon the local information of the encountered node to achievea better delivery probability; the third phase updates the dataassignment at the sender and receiver based on the actuallytransmitted data amount between them. Consider Figure 3 as

an example, where node u has data S to transmit to v. First,node u determines the criterion assignment based on the pathsin Figure 3a. Then, when node u encounters c, node u needsto adjust the size of data to be transmitted to node c, which canmaximally improve the delivery probability of S determinedby the criterion assignment, based on the additional paths inFigure 3b. After the data transmission between u and v, bothneed to update their data assignment.

A. Criterion AssignmentAlthough the data to be transmitted to a neighbor will be

adjusted when the source node meets the neighbor, the sourcenode should provide a reasonable criterion for later adjustment.To obtain a good delivery probability, the criterion assignmentis determined as follows. Let Puv denote the set of pathsfrom node u to v, which only includes the paths that are nomore than two hops from node u to v, and let Cuv denote thecapacity of Puv , where Cuv =

∑i∈Puv

Ciuv . If Cuv < S, the

data assignment at path i in Puv is Si = S× Ciuv

Cuv. If Cuv ≥ S,

we rank Puv according to the available probability of eachpath, and then the first k paths of Puv , where

∑ki=1 Ci

uv ≥ S,are selected and the assigned data for each path is equal tothe path capacity except the last one, the assignment of whichis S−

∑k−1i=1 Ci

uv . Let Au denote the data assignment at nodeu, Au = {Si, i ∈ Puv}. This data assignment is determinedbefore transmission.

B. Real-Time Adjustment

With the criterion assignment, we have the data assignmentfor each path from node u to v. The data assigned at the paththat goes through node c is the amount of data to be transmittedto node c when nodes u and c encounter. However, node c maybe more likely to deliver a certain amount of data to node vthan node u. If so, this amount of data should be transmittedto node v from c instead of u so as to improve the deliveryprobability. Thus, node u needs to investigate this capability ofnode c based on c’s local information (paths to node v as theshadow part in Figure 3b) when they are in contact, and thenadjust the amount of data to be transmitted to c accordingly.

The real-time adjustment works as follows. When nodes uand c are in contact, first u selects the path among Puv that hasthe minimum delivery probability of the assigned data beforedeadline. Let us say the selected path is j and its assigned datais Sj . Then, the path among Pcv, to which the reallocation ofSj can maximally improve the delivery probability of S, canbe determined, say path k. Sj should be transferred to node vfrom c instead of u, and thus Sj needs to be transmitted fromu to c first. So the data amount to be transmitted from u to cwill be increased by Sj and Sj will be assigned at path k. Theprocess is repeated. If the reallocation of Sj cannot increasethe delivery probability of S, the process stops.

The following shows how to compare the improvementof the delivery probability of S when Sj is reassigned todifferent paths, say path d and e in Pcv. Let Sd and Se denotethe amount of data already assigned at path d and path e,respectively, which can be the assignment of data currentlycarried by node c or zero. To compare their improvement, weonly need to calculate the delivery probabilities of Sj+Sd+Se



when Sj is reallocated to path d and path e, respectively,since the assignment of the rest of S is not impacted. If Sj isreassigned at path d, the delivery probability of Sj + Sd + Se

is the product of the delivery probability of Sd+Sj along pathd and the delivery probability of Se along path e. Similarlywe can calculate the delivery probability of Sj + Sd + Se ifSj is assigned at path e, and then we can determine to whichpath Sj should be assigned.

C. Assignment UpdateThe real-time adjustment ends with the total amount of data

to be transmitted from node u to node c, denoted as Sc, andthe assignment of Sc at Pcv. Together with the assignment ofcurrently carried data at node c, let Ac denote the total dataassignment at node c, Ac = {Si, i ∈ Pcv}. As the actuallytransmitted data from u to c, denoted by S′

c, may be less thanSc due to the uncertainty of their contact duration, both nodeu and node c need to update the data assignment after the datatransmission. For node u, the amount S′

c should be excludedfrom the assignment Au by sequentially removing the dataassigned at the path with the lowest delivery probability.Similarly, for node c, the amount Sc−S′

c needs to be removedfrom Ac.

D. Discussion

Based on these three phases, the distributed algorithmworks as follows. First, the source node determines a criterionassignment for the data item. Then, when the node that carriesthe data encounters a neighbor, it first determines whether theneighbor is the source node or if it received the data fromthis neighbor before. If so, no data will be transmitted. If theencountered node does not have the data, the node carrying thedata needs to determine how much data is to be transmittedto the encountered node (if both of nodes have the data, theyalso need to determine who should transmit data to another)by comparing the improvement of the delivery probability bydata reallocation. After that, the determined reallocated data istransmitted and then the data assignment at sender and receiveris updated based on the amount of data actually transferredduring their contact.

In the design of the distributed algorithm, nodes are onlyrequired to maintain two-hop information. One motivation forthis is that, as stated above, nodes opportunistically contacteach other and hence locally maintained multi-hop informationcannot be promptly updated. Another reason is that by regu-lating to two-hop paths, the data delivery probability over thepaths can be easily calculated and thus nodes can quickly makedecisions at runtime, i.e., at the phase of real-time adjustment.

Although the distributed algorithm requires that the sourcenode must have a path with no more than two hops toinfrastructure, it can easily establish these paths through itsneighbors since most nodes in the network intermittentlyconnect with infrastructure. Moreover, intermediate nodes, e.g.node c in Figure 3b, will also exploit two-hop paths to forwarddata. Therefore, there are enough paths to be explored for datatransmission.

The design of the distributed algorithm takes advantageof both node contact patterns and stochastic node contacts.

TABLE I

PARAMETER SETTINGS FOR BENCHMARK

Criterion assignment is determined based on two-hop con-tact patterns, which can be interpreted as how much datais expected to be transmitted along each path. Real-timeadjustment exploits stochastic node contacts to find opportu-nities (e.g., additional paths) that can improve the deliveryprobability at runtime. These schemes make up for the lack ofglobal information, and thus the distributed algorithm shouldperform close to the heuristic algorithm.

The distributed algorithm does not require the central entityto store the global information, thus it can be widely deployedto opportunistic mobile networks. An example use case isvehicular ad hoc networks. Since the contact duration betweenvehicle and roadside unit is short, vehicles can leverage coop-erative data offload to overcome the incomplete data transferdue to the transient contact. Since the heuristic algorithmincurs additional communication overhead between vehicleand roadside unit, it does not work well in this scenario.However, the distributed algorithm, which is insensitive to net-work size and has no communication overhead between vehi-cle and roadside unit, perfectly fits this scenario. Moreover, thedistributed algorithm is ready and easy to be integrated withthe architecture DTN2, where infrastructure can be treated justas a type of nodes.

VII. PERFORMANCE EVALUATION

In this section, we evaluate the performance of the heuristicalgorithm and distributed algorithm based on synthetic net-works and real traces.

A. Evaluation on Synthetic Networks

First we investigate the performance of the heuristicalgorithm on synthetic networks. The synthetic networks aregenerated by the well-know benchmark [16]. It providespower-law distribution of node degree and edge weight, andvarious topology control. There are several parameters to con-trol the generated network: the number of nodes, n; the mixingparameter for the weights, µw; the mixing parameter for thetopology, µt; the exponent for the weight distribution, ξ; theaverage node degree, d; and the maximum node degree. dm.The settings of these parameters are shown in Table I.

With the synthetic networks, we also need to generatethe contact pattern between nodes and between node andinfrastructure. The settings of these distribution parameters(i.e., α, β and λ) are shown in Table II, where α = [3, 4],for example, means that α between pair of nodes is set to arandom number between 3 and 4, which follows the uniformdistribution. The parameter settings are chosen based on ouranalysis on real traces (i.e., MIT Reality and DieselNet).



TABLE II

PARAMETER SETTINGS FOR NODE CONTACTS

Fig. 4. Data Delivery probability based on estimation and simulation forIndividual and Cooperative in synthetic networks, where d = 10, dm = 15,α = [6, 10], λ′ = [0.001, 0.01]. (a) probability vs data size, T = 400.(b) probability vs deadline, S = 20.

When a node transmits data to the infrastructure, it needsto decide whether to offload the data to other nodes so as toimprove the delivery probability. So, the node goes throughthe following procedure. First, it calculates the probabilitythat the node directly transmits the data to the infrastructure(no offloading, the data is transmitted only when it con-nects with infrastructure), denoted as Individual Est. Then itemploys the heuristic algorithm to calculate the probability if itoffloads the data to other nodes, denoted as Cooperative Est.If Cooperative Est is greater than Individual Est, the nodeoffloads the data to the selected paths with the correspond-ing data assignment determined by the heuristic algorithm.Individual Est and Cooperative Est are compared for datatransmissions with different data sizes and deadlines at eachnode in the networks. We also compare the delivery proba-bilities based on simulations (denoted as Individual Sim andCooperative Sim). Specifically, we generate the random num-bers for inter-contact duration and contact duration betweenneighboring nodes according to their distributions, then datais transmitted between nodes based on these generated contactinformation.

Cooperative Does Help: Figure 4 shows the comparisonof the delivery probabilities for Individual and Cooperative insynthetic networks. Figure 4a shows the successful probabilityof transmissions with varying data sizes, meanwhile Figure4b shows the successful probability of transmissions withvarying deadlines, where the estimated probability is averagedfor all the nodes for each transmission, and the simulatedprobability is averaged for all the nodes with 500 simulationruns (i.e., the simulated probability is the number of successfultransmissions divided by the total number of transmissions).

As shown in Figure 4a, for both Cooperative and Individual,the probability decreases with the increase of data size asexpected. When the size of data is small, the data can beeasily transmitted directly to infrastructure, and thus their

Fig. 5. Data Delivery probability based on estimation for Optimal andCooperative. Data is sent to infrastructure from a selected node in a syntheticnetwork, where d = 10, dm = 15, α = [6, 10], λ′ = [0.001, 0.01].(a) probability vs data size, T = 400. (b) probability vs deadline, S = 10.

probabilities are similar. However, when the size of dataincreases, the probability of Individual drops dramaticallyfrom S = 10 to S = 40 while cooperative offloadingsignificantly improves the delivery probability (e.g., it canbe increased from 20% to 70% when S = 20). When thedata size increases further, the probability of Cooperativealso decreases, i.e. cooperative offloading cannot improve thedelivery probability as much as before. Figure 4b shows theprobability of data transmissions with varying deadlines. Sim-ilarly, Cooperative and Individual start at the same probability.The difference between them expands and then narrows whenthe deadline further looses.

Delivery Probability Estimation. As shown in Figure 4a,for Individual, the estimated delivery probability and thesimulated probability are almost the same for different datasizes. Meanwhile, for Cooperative there is a little differencebetween the estimated probability and the simulated proba-bility when the data size is large. That might be incurredby the approximation of sum of Pareto random variables asdiscussed in Section IV. When the data size increases, morenode contacts are required to transfer a data item betweenneighboring nodes, which may increase the deviation of theapproximation. However, this will not impact the decisionof cooperative offloading, since the estimated probability ofCooperative is still much higher than Individual. As thedeadline is irrelevant to the approximation, the increase of thedeadline does not impact the difference between the estimatedand simulated probabilities, and they are almost identical asshown in Figure 4b.

Cooperative Is Close to the Optimum: We also compareCooperative to the optimal solution of the cooperative offload-ing problem, denoted as Optimal. Due to the high computa-tional complexity of Optimal, we choose a problem of smallsize (i.e., we randomly select one node in a generated syntheticnetwork to send data to infrastructure) and use brute-forcesearch to find the optimum.

Figures 5a and 5b depict the delivery probabilities ofCooperative and Optimal in terms of varying data size anddeadline, respectively. Cooperative is close to Optimal in bothcases and different data sizes and deadlines only slightly affecttheir disparities.

When Data Offloading Benefits: Figure 6 gives the relationbetween percentage of offloaded data transmissions (over allthe data transmissions with different data sizes and dead-lines) and contact pattern with infrastructure at particular



Fig. 6. Percentage of offloaded data transmissions at nodes with differentinter-contact duration and contact duration to infrastructure, where d = 10,dm = 15, λ′ = [0.001, 0.01], α = [6, 10], S varies from 2 to 100,and T varies from 20 to 1000. (a) expected inter-contact duration betweennode and infrastructure. (b) expected contact duration between node andinfrastructure.

nodes, where contact pattern is captured by expected inter-contact duration and contact duration between a node andinfrastructure. Intuitively, nodes that frequently contact withinfrastructure or have long contact duration with infrastructuredo not need to offload the data as much as other nodes. Con-sistent with this intuition, as depicted in Figure 6a, the morefrequently a node contacts infrastructure, the less transmissionsare offloaded. Moreover, as shown in Figure 6b, the shorterthe contact duration, the more transmissions are offloaded.Moreover, inter-contact duration makes a larger impact onoffloaded transmissions than contact duration. For example,when the contact frequency is low, a node may still need tooffload the data even if the contact duration is long, unlessthe data is transmitted during the contact. That is the reasonoffloaded transmissions decrease slow with the increase ofcontact duration as shown in Figure 6b.

Network Settings Affect Data Offloading: Figure 7 givesthe contour of the percentage of facilitated nodes for datatransmissions with different data sizes and deadlines in variousnetwork settings, where the facilitated node means that datatransmission with particular data size and deadline is offloadedat the node. For example, the area enclosed by the greenline (80%) in Figure 7a indicates there are more than orequal to 80% of nodes that offload the transmissions withcorresponding data size and deadline.

As shown in Figure 7a, for the network with a = [6, 10] andλ′ = [0.001, 0.01], most of the transmissions are offloaded(i.e. most of area is enclosed by green line (80%)) exceptthe transmissions with large data sizes and short deadlinesshown as the upper left corner. As shown in Figure 7b,when nodes more frequently contact infrastructure (i.e. λ′ =[0.01, 0.1]), small data can be easily transmitted to infrastruc-ture directly, and thus the transmissions with small data sizeare not frequently offloaded (less than 40%). On the contrary,the transmissions with large data sizes and short deadlinesare mostly offloaded. When contact duration between nodesbecomes short (i.e. α = [3, 4] as in Figure 7c), compared toFigure 7a, the delivery probability of the transmissions withlarge data sizes cannot be greatly improved and thus they are

less offloaded at nodes. However, the transmissions with smalldata sizes are mostly offloaded.

When nodes have a small number of neighbors in a network,e.g. d = 5 and dm = 8 as in Figure 7d, compared to Figure 7a,the reduced node degree undermines the offloading capabilityof the network; i.e., the offloaded transmissions are generallyless than that of Figure 7a. Compared with Figure 7d, inter-contact duration between node and infrastructure is decreased(λ′ = [0.01, 0.1]) in Figure 7e. This dramatically changes thepattern of the offloaded transmissions. As shown in Figure 7e,offloading can improve the probability of most transmissionsbut not for the transmissions with large data sizes and longdeadlines. That is because when nodes frequently contactinfrastructure, it is easy for nodes to transmit a data item toinfrastructure by multiple contacts and offloading the data itemto other nodes may not yield a high delivery probability. Last,let us compare Figure 7f with Figure 7c. It is shown again thatnode degree affects the offloading capability of the network,since node degree determines how many nodes nearby thatcan be explored for data offloading.

Therefore, from Figure 7, we can see network settingsdo affect data offloading. In general, when node degree ishigher, there are more paths to infrastructure and thus datatransmissions are offloaded more often. When nodes contactinfrastructure more frequently, nodes can transmit more datadirectly to infrastructure and thus data transmissions areoffloaded less often. When contact duration between nodesbecomes shorter, it is less possible for offloading to increasedata delivery probability and thus data transmissions areoffloaded less often.

In summary, through the extensive experiments on syntheticnetworks, we can conclude that the proposed probabilisticframework can accurately estimate data delivery probabilityover opportunistic paths, cooperative data offloading can sig-nificantly improve data delivery probability in various networksettings and contact patterns, and the data delivery probabilityachieved by the heuristic algorithm is close to the optimum.

B. Evaluations on Real TracesNext, we evaluate the performance of the heuristic algorithm

and the distributed algorithm based on real traces. We comparethem with other two solutions: Spread, where nodes offloadthe carried data to any encountered node, and MaxRate, wherenodes only transmit the carried data to infrastructure or thenode in its neighbor set that has the maximum contact rate withinfrastructure. We also give the performance of no offloading,i.e., the node directly sends data to infrastructure, denoted asIndividual.

The two opportunistic mobile network traces used areMIT Reality [7] and DieselNet [2]. They record contactsamong mobile devices equipped with Bluetooth or WiFi mov-ing on university campus (MIT Reality) and in suburban area(DieselNet). The details of these two traces are summarizedin Table III.

In the experiment, half of the trace is used as warmupto obtain the distribution information of contact frequencyand contact duration between nodes, and other half is usedto run data transmission. Since there is no infrastructure ineither trace, we choose the node with the maximum degree



Fig. 7. Percentage of facilitated nodes for data transmissions with different data sizes and deadlines in various network settings. (a) d = 10, dm = 15,α = [6, 10], λ′ = [0.001, 0.01]. (b) d = 10, dm = 15, α = [6, 10], λ′ = [0.01, 0.1]. (c) d = 10, dm = 15, α = [3, 4], λ′ = [0.001, 0.01]. (d) d = 5,dm = 8, α = [6, 10], λ′ = [0.001, 0.01]. (e) d = 5, dm = 8, α = [6, 10], λ′ = [0.01, 0.1]. (f) d = 5, dm = 8, α = [3, 4], λ′ = [0.001, 0.01].

TABLE III

TRACE SUMMARY

to act as infrastructure. Nodes that cannot construct two-hop path or one-hop to infrastructure are excluded from thetraces (note that only very few nodes are eliminated). In thesimulation, nodes send data items with different sizes anddeadlines to infrastructure at a randomly selected timestamp ineach simulation run and the results are averaged for 50 runs.

For MIT Reality trace, due to the Bluetooth scan interval,only the contacts with duration of five minutes or more arerecorded, where contact duration is at minute level and inter-contact duration is at day level. Thus, for data transmis-sions, the data size varies from 10MB to 60MB (the datarate is set to 240Kbps) and the time constraint varies from10 to 100 hours. For DieselNet, contact duration is at secondlevel and inter-contact duration is at hour level. Thus, for datatransmissions in DieselNet, the data size varies from 2MB to10MB (the data rate is set to 3.2Mbps) and the deadline variesfrom 1 to 10 hours.

Figures 8a and 8b give the total transmissions, the offloadedtransmission of Heuristic, the successful transmissions ofHeuristic, Distributed, MaxRate, Spread and Individual inMIT Reality and DieselNet, respectively. In MIT Reality,as shown in Figure 8a, there are totally near 6000 datatransmissions. If no offloading (Individual) is performed,

Fig. 8. Total transmissions, offloaded transmissions of Heuristic and success-ful transmissions of Heuristic, Distributed, MaxRate, Spread and Individualin MIT Reality and DieselNet. (a) MIT Reality, S = [10, 60] MB andT = [10, 100] hours. (b) DieselNet, S = [2, 10] MB and T = [1, 10]hours.

the number of successful transmissions is only about 1400.However, Heuristic has about 3000 successful transmissionswhich is more than two times of Individual. This is also betterthan other three algorithms: about 20% higher than that ofDistributed and 70% higher than that of MaxRate and Spread.Moreover, about 40% of total transmissions are offloadedby by Heuristic. The number of successful transmissions arehigher than the offloaded transmissions because the transmis-sions that are not offloaded might be completed via directcontact between source node and infrastructure. Distributed isclose to Heuristic, better than MaxRate and Spread, and almost



two times of Individual. Although Spread and MaxRate havedifferent offload strategies, their performance is similar.

In DieselNet, as shown in Figure 8b, there are about1800 total transmissions. The ratio of successful transmissionsof Heuristic and Distributed are both high, more than 70%.Heuristic offloaded more than 50% of total transmissions. Thesuccessful transmissions of Heuristic and Distributed are 40%more than that of MaxRate and Spread (their performance isstill similar in DieselNet, about 900) and almost two times ofIndividual.

Based on the evaluations on the real traces, it can beconcluded that the heuristic algorithm performs better thanthe distributed algorithm, both of them are much better thanthe simple offloading strategies MaxRate and Spread, and theyalso perform about two times better than no offloading in bothtraces. In spite of the lack of global information, the perfor-mance of the distributed algorithm that only exploits paths nomore than two hops for data transmissions is comparable tothat of the heuristic algorithm.

VIII. CONCLUSION

In this paper, we addressed the problem of cooperativelyoffloading data among opportunistically connected mobiledevices so as to improve the probability of data delivery toinfrastructure. We first provided the probabilistic frameworkto estimate the probability of data delivery over the oppor-tunistic path and then, based on that, we proposed a heuristicalgorithm to solve cooperative offloading. To cope with thelack of global information, we further proposed a distributedalgorithm. The evaluation results show that the probabilisticframework accurately estimates the data delivery probability,cooperative offloading greatly improves the delivery probabil-ity, the heuristic algorithm approximates the optimum, andthe performance of the heuristic algorithm and distributedalgorithm outperforms other approaches.

APPENDIX

APPROXIMATING SUM OF PARETO VARIABLES

For a collection of i.i.d. random variables {Di, i =1, . . . , c} and Di ∼ Pareto(α, β), let D =

∑ci=1 Di, M be

the maximum value of {Di, i = 1, . . . , c}, and R = D/M.Then, D can be approximated by M, considering the ratio R.First let us consider the ratio of D1/M. Then the CumulativeDensity Function (CDF) of D1/M can be expressed in termsof the conditional distribution F (x|y) of D1 on the fixedmaximum M = y [36] as following

G(z) = P (D1/M < z) =∫ ∞

0P (

D1

y< z|y)dF c(y)

=∫ ∞

0F (yz|y)dF c(y).

By taking the derivative with respect to z, then

g(z) =∫ ∞

βyf(yz|y)dF c(y). (7)

The conditional density f(x|y) corresponding to the distribu-tion F (x|y) is given by

f(x|y) =δ(y − x)

c+ (1 − 1

c)f(x)H(y − x)

F (y), (8)

where H(x) = 1 for x > 0, otherwise H(x) = 0. The firstterm on the rhs of (8) corresponds to the case M = D1, whoseprobability is 1/c, while the complementary event D1 < Moccurs with probability (1 − 1/c). By plugging (8) into (7),we then have

g(z) =∫ ∞

0

(δ(1− z)

c+

c− 1c

· f(yz)H(y − yz)F (y)

)dF c(y)

=δ(1− z)

c+ (c− 1)

∫ ∞

0yf(yz)f(y)F c−2(y)dy.

Then, the expectation of the ratio R = D/M, denoted as R̄,can be calculated as

R̄ = c

∫ 1

0zg(z)dz

= 1 + c(c− 1) ·∫ ∞

0yf(y)F c−2(y)

∫ 1

0zf(yz)dydz

= 1 + c(c− 1)

·∫ ∞

0yf(y)F c−2(y)

(F (y)

y− 1

y2

∫ y

0F (x)dx

)dy

= 1 + (c− 1)(∫ ∞

0dF c(y)

)

− c

(∫ ∞

0

(1y

∫ y

0F (u)du

)dF c−1(y)

)

= c

(1−

∫ ∞

0

(1y

∫ y

0F (u)du

)dF c−1(y)

).

For Pareto distribution F (x) = 1− (βx )α, then

R̄ = c− c

∫ ∞

β1−

(βy )α − αβ

y

1− ad

(1− (

β

y)α

)c−1

.

By replacing the integral by beta function B(·, ·), finally wehave

R̄ =

⎧⎪⎪⎨

⎪⎪⎩

1− cB(c, α−1)1− α

, α ̸= 1c∑

i=1

i−1, α = 1.(9)

Thus, P (D ≥ D) can be approximated as

P (D ≥ D) ≈ P (MR̄ ≥ D) = P (M ≥ D

R̄)

= 1−c∏

i=1

P (Di <D

R̄) = 1−

(1− (

βR̄

D)α

)c

,

where R̄ can be easily calculated using (9).

REFERENCES

[1] (Aug. 2015). DTN2. [Online]. Available: http://www.dtnrg.org[2] A. Balasubramanian, B. Levine, and A. Venkataramani, “DTN routing

as a resource allocation problem,” ACM SIGCOMM Comput. Commun.Rev., vol. 37, no. 4, pp. 373–384, 2007.

[3] N. Banerjee, M. D. Corner, and B. N. Levine, “An energy-efficient archi-tecture for DTN throwboxes,” in Proc. IEEE INFOCOM, May 2007,pp. 776–784.

[4] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “CloneCloud:Elastic execution between mobile device and cloud,” in Proc. ACMEuroSys, 2011, pp. 301–314.

[5] E. Cuervo et al., “MAUI: Making smartphones last longer with codeoffload,” in Proc. ACM MobiSys, 2010, pp. 49–62.



[6] E. M. Daly and M. Haahr, “Social network analysis for routing indisconnected delay-tolerant manets,” in Proc. ACM MobiHoc, 2007,pp. 32–40.

[7] N. Eagle and A. Pentland, “Reality mining: Sensing complex socialsystems,” Pers. Ubiquitous Comput., vol. 10, no. 4, pp. 255–268, 2006.

[8] V. Erramilli, A. Chaintreau, M. Crovella, and C. Diot, “Diversity offorwarding paths in pocket switched networks,” in Proc. ACM IMC,2007, pp. 161–174.

[9] V. Erramilli, M. Crovella, A. Chaintreau, and C. Diot, “Delegationforwarding,” in Proc. ACM MobiHoc, May 2008, pp. 251–260.

[10] W. Gao, Q. Li, B. Zhao, and G. Cao, “Social-aware multicast indisruption-tolerant networks,” IEEE/ACM Trans. Netw., vol. 20, no. 5,pp. 1553–1566, Oct. 2012.

[11] P. Gilbert, V. Ramasubramanian, P. Stuedi, and D. Terry, “Peer-to-peerdata replication meets delay tolerant networking,” in Proc. IEEE ICDCS,Jun. 2011, pp. 109–120.

[12] B. Han et al., “Mobile data offloading through opportunistic communi-cations and social participation,” IEEE Trans. Mobile Comput., vol. 11,no. 5, pp. 821–834, May 2012.

[13] T. Hossmann, T. Spyropoulos, and F. Legendre, “Know thy neighbor:Towards optimal mapping of contacts to social graphs for DTN routing,”in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–9.

[14] S. Jain, M. Demmer, R. Patra, and K. Fall, “Using redundancy to copewith failures in a delay tolerant network,” in Proc. ACM SIGCOMM,2005, pp. 109–120.

[15] T. Karagiannis, J.-Y. Le Boudec, and M. Vojnovic, “Power law andexponential decay of intercontact times between mobile devices,” IEEETrans. Mobile Comput., vol. 9, no. 10, pp. 1377–1390, Oct. 2010.

[16] A. Lancichinetti and S. Fortunato, “Benchmarks for testing communitydetection algorithms on directed and weighted graphs with overlap-ping communities,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat.Interdiscip. Top., vol. 80, no. 1, p. 016118, 2009.

[17] Y. Li and W. Wang, “Can mobile cloudlets support mobile applications?”in Proc. IEEE INFOCOM, Apr./May 2014, pp. 1060–1068.

[18] Y. Lin, B. Li, and B. Liang, “Efficient network coded data transmissionsin disruption tolerant networks,” in Proc. IEEE INFOCOM, Apr. 2008,pp. 2180–2188.

[19] Y. Lin, B. Li, and B. Liang, “Stochastic analysis of network codingin epidemic routing,” IEEE J. Sel. Areas Commun., vol. 26, no. 5,pp. 794–808, Jun. 2008.

[20] S. Liu and A. D. Striegel, “Exploring the potential in practice foropportunistic networks amongst smart mobile devices,” in Proc. ACMMobiCom, 2013, pp. 315–326.

[21] Z. Lu, G. Cao, and T. La Porta, “Networking smartphones for disasterrecovery,” in Proc. IEEE PerCom, Mar. 2016, pp. 1–9.

[22] Z. Lu, X. Sun, and T. L. Porta, “Cooperative data offloading inopportunistic mobile networks,” in Proc. IEEE INFOCOM, Apr. 2016,pp. 1–9.

[23] Z. Lu, X. Sun, Y. Wen, and G. Cao, “Skeleton construction in mobilesocial networks: Algorithms and applications,” in Proc. IEEE SECON,Jun./Jul. 2014, pp. 477–485.

[24] Z. Lu, X. Sun, Y. Wen, G. Cao, and T. L. Porta, “Algorithms andapplications for community detection in weighted networks,” IEEETrans. Parallel Distrib. Syst., vol. 26, no. 11, pp. 2916–2926, Nov. 2015.

[25] Z. Lu, Y. Wen, and G. Cao, “Information diffusion in mobilesocial networks: The speed perspective,” in Proc. IEEE INFOCOM,Apr./May 2014, pp. 1932–1940.

[26] A.-K. Pietilänen and C. Diot, “Dissemination in opportunistic socialnetworks: The role of temporal communities,” in Proc. ACM MobiHoc,2012, pp. 165–174.

[27] V. Sciancalepore, D. Giustiniano, A. Banchs, and A. Hossmann-Picu,“Offloading cellular traffic through opportunistic communications:Analysis and optimization,” IEEE J. Sel. Areas Commun., vol. 34, no. 1,pp. 122–137, Jan. 2016.

[28] P. Sermpezis and T. Spyropoulos, “Not all content is created equal:Effect of popularity and availability for content-centric opportunisticnetworking,” in Proc. ACM MobiHoc, 2014, pp. 103–112.

[29] C. Shi, V. Lakafosis, M. H. Ammar, and E. W. Zegura, “Serendipity:Enabling remote computing among intermittently connected mobiledevices,” in Proc. ACM MobiHoc, 2012, pp. 145–154.

[30] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, “Spray and wait:An efficient routing scheme for intermittently connected mobile net-works,” in Proc. ACM WDTN, 2005, pp. 252–259.

[31] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, “Efficient routingin intermittently connected mobile networks: The multiple-copy case,”IEEE/ACM Trans. Netw., vol. 16, no. 1, pp. 77–90, Feb. 2008.

[32] S. Wang, X. Wang, X. Cheng, J. Huang, and R. Bie, “The tempo-spatialinformation dissemination properties of mobile opportunistic networkswith Levy mobility,” in Proc. IEEE ICDCS, Jun./Jul. 2014, pp. 124–133.

[33] X. Wang, M. Chen, Z. Han, D. O. Wu, and T. T. Kwon, “TOSS:Traffic offloading by social network service-based opportunistic sharingin mobile social networks,” in Proc. IEEE INFOCOM, Apr./May 2014,pp. 2346–2354.

[34] L. Xiang, S. Ye, Y. Feng, B. Li, and B. Li, “Ready, set, go: Coalescedoffloading from mobile devices to the cloud,” in Proc. IEEE INFOCOM,Apr./May 2014, pp. 2373–2381.

[35] Q. Yuan, I. Cardei, and J. Wu, “Predict and relay: An efficient rout-ing in disruption-tolerant networks,” in Proc. ACM MobiHoc, 2009,pp. 95–104.

[36] I. V. Zaliapin, Y. Y. Kagan, and F. P. Schoenberg, “Approximating thedistribution of Pareto sums,” Pure Appl. Geophys., vol. 162, nos. 6–7,pp. 1187–1228, 2005.

[37] W. Zhang, Y. Wen, and D. O. Wu, “Collaborative task execution inmobile cloud computing under a stochastic wireless channel,” IEEETrans. Wireless Commun., vol. 14, no. 1, pp. 81–93, Jan. 2015.

[38] X. Zhang and G. Cao, “Transient community detection and itsapplication to data forwarding in delay tolerant networks,” in Proc. IEEEICNP, Oct. 2013, pp. 1–10.

[39] X. Zhuo, W. Gao, G. Cao, and S. Hua, “An incentive framework forcellular traffic offloading,” IEEE Trans. Mobile Comput., vol. 13, no. 3,pp. 541–555, Mar. 2014.

[40] X. Zhuo, Q. Li, W. Gao, G. Cao, and Y. Dai, “Contact durationaware data replication in delay tolerant networks,” in Proc. IEEE ICNP,Oct. 2011, pp. 236–245.

Zongqing Lu received the B.S. and M.S. degreesfrom Southeast University, China, and the Ph.D.degree from Nanyang Technological University,Singapore, in 2014. From 2014 to 2017, he held apost-doctoral position at the Department of Com-puter Science and Engineering, Pennsylvania StateUniversity. In 2017, he joined the Department ofComputer Science, Peking University, where he iscurrently an Assistant Professor. His research inter-ests include networked systems, mobile computing,and networking.

Xiao Sun received the B.S. degree fromTianjin University, the M.E. degree from theChinese Academy of Sciences, and the Ph.D.degree from Pennsylvania State University. Hisresearch interests include wireless health, pervasivecomputing, mobile computing, and mobile socialnetworks.

Thomas La Porta (F’02) received the B.S.E.E.and M.S.E.E. degrees from The Cooper Union,New York, NY, USA, and the Ph.D. degree in elec-trical engineering from Columbia University, NewYork. He joined Penn State in 2002. He was thefounding Director of the Institute of Networkingand Security Research, Penn State. Prior to joiningPenn State, he was with Nokia Bell Labs, wherehe was the Director of the Mobile NetworkingResearch Department. He is currently the Director ofthe School of Electrical Engineering and Computer

Science, Penn State University. He is also an Evan Pugh Professor and theWilliam E. Leonhard Chair Professor with the Computer Science and Engi-neering Department. He has authored numerous papers and holds 39 patents.He is a Bell Labs Fellow. He received two Thomas Alva Edison PatentAwards. He was the founding Editor-in-Chief of the IEEE TRANSACTIONSON MOBILE COMPUTING.

ieee/acm transactions on networking 1 cooperative data ofﬂoad in opportunistic ... ·...

Documents