adaptive opportunistic routing for wireless ad hoc networks.bak

14
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 243 Adaptive Opportunistic Routing for Wireless Ad Hoc Networks Abhijeet A. Bhorkar, Mohammad Naghshvar, Student Member, IEEE, Tara Javidi, Member, IEEE, and Bhaskar D. Rao, Fellow, IEEE Abstract—A distributed adaptive opportunistic routing scheme for multihop wireless ad hoc networks is proposed. The proposed scheme utilizes a reinforcement learning framework to opportunis- tically route the packets even in the absence of reliable knowledge about channel statistics and network model. This scheme is shown to be optimal with respect to an expected average per-packet re- ward criterion. The proposed routing scheme jointly addresses the issues of learning and routing in an opportunistic context, where the network structure is characterized by the transmission success probabilities. In particular, this learning framework leads to a sto- chastic routing scheme that optimally “explores” and “exploits” the opportunities in the network. Index Terms—Opportunistic routing, reward maximization, wireless ad hoc networks. I. INTRODUCTION O PPORTUNISTIC routing for multihop wireless ad hoc networks has seen recent research interest to overcome deciencies of conventional routing [1]–[6] as applied in wireless setting. Motivated by classical routing solutions in the Internet, conventional routing in ad hoc networks attempts to nd a xed path along which the packets are forwarded [7]. Such xed-path schemes fail to take advantage of broadcast nature and opportunities provided by the wireless medium and result in unnecessary packet retransmissions. The opportunistic routing decisions, in contrast, are made in an online manner by choosing the next relay based on the actual transmission outcomes as well as a rank ordering of neighboring nodes. Op- portunistic routing mitigates the impact of poor wireless links by exploiting the broadcast nature of wireless transmissions and the path diversity. The authors in [1] and [6] provided a Markov decision the- oretic formulation for opportunistic routing. In particular, it is shown that the optimal routing decision at any epoch is to se- lect the next relay node based on a distance-vector summa- rizing the expected-cost-to-forward from the neighbors to the Manuscript received June 27, 2009; revised February 09, 2010; August 25, 2010; and February 28, 2011; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor C. Westphal; accepted May 17, 2011. Date of publication July 18, 2011; date of current version February 15, 2012. This work was supported in part by UC Discovery Grant #com07-10241, Ericsson, Intel Corporation, QUALCOMM, Inc., Texas Instruments, Inc., CWC at UCSD, and NSF CAREER Award CNS-0533035. The authors are with the Department of Electrical and Computer En- gineering, University of California, San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TNET.2011.2159844 destination. This “distance” is shown to be computable in a distributed manner and with low complexity using the proba- bilistic description of wireless links. The study in [1] and [6] provided a unifying framework for almost all versions of op- portunistic routing such as SDF [2], Geographic Random For- warding (GeRaF) [3], and ExOR [4], where the variations in [2]–[4] are due to the authors’ choices of cost measures to opti- mize. For instance, an optimal route in the context of ExOR [4] is computed so as to minimize the expected number of trans- missions (ETX), while GeRaF [3] uses the smallest geograph- ical distance from the destination as a criterion for selecting the next-hop. The opportunistic algorithms proposed in [1]–[6] depend on a precise probabilistic model of wireless connections and local topology of the network. In a practical setting, however, these probabilistic models have to be “learned” and “maintained.” In other words, a comprehensive study and evaluation of any op- portunistic routing scheme requires an integrated approach to the issue of probability estimation. Authors in [8] provide a sen- sitivity analysis for the opportunistic routing algorithm given in [6]. However, by and large, the question of learning/estimating channel statistics in conjunction with opportunistic routing re- mains unexplored. In this paper, we rst investigate the problem of opportunis- tically routing packets in a wireless multihop network when zero or erroneous knowledge of transmission success proba- bilities and network topology is available. Using a reinforce- ment learning framework, we propose a distributed adaptive op- portunistic routing algorithm (d-AdaptOR) that minimizes the expected average per-packet cost for routing a packet from a source node to a destination. This is achieved by both suf- ciently exploring the network using data packets and exploiting the best routing opportunities. Our proposed reinforcement learning framework allows for a low-complexity, low-overhead, distributed asynchronous im- plementation. The signicant characteristics of d-AdaptOR are that it is oblivious to the initial knowledge about the network, it is distributed, and it is asynchronous. The main contribution of this paper is to provide an oppor- tunistic routing algorithm that: 1) assumes no knowledge about the channel statistics and network, but 2) uses a reinforcement learning framework in order to enable the nodes to adapt their routing strategies, and 3) optimally exploits the statistical op- portunities and receiver diversity. In doing so, we build on the Markov decision formulation in [6] and an important theorem in Q-learning proved in [9]. There are many learning-based routing solutions (both heuristic or analytically driven) for conventional routing in wireless or wired networks [10]–[15]. None of these 1063-6692/$26.00 © 2011 IEEE http://ieeexploreprojects.blogspot.com

Upload: ieeexploreprojects

Post on 28-Nov-2014

1.663 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Adaptive opportunistic routing for wireless ad hoc networks.bak

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 243

Adaptive Opportunistic Routing for WirelessAd Hoc Networks

Abhijeet A. Bhorkar, Mohammad Naghshvar, Student Member, IEEE, Tara Javidi, Member, IEEE, andBhaskar D. Rao, Fellow, IEEE

Abstract—A distributed adaptive opportunistic routing schemefor multihop wireless ad hoc networks is proposed. The proposedscheme utilizes a reinforcement learning framework to opportunis-tically route the packets even in the absence of reliable knowledgeabout channel statistics and network model. This scheme is shownto be optimal with respect to an expected average per-packet re-ward criterion. The proposed routing scheme jointly addresses theissues of learning and routing in an opportunistic context, wherethe network structure is characterized by the transmission successprobabilities. In particular, this learning framework leads to a sto-chastic routing scheme that optimally “explores” and “exploits”the opportunities in the network.

Index Terms—Opportunistic routing, reward maximization,wireless ad hoc networks.

I. INTRODUCTION

O PPORTUNISTIC routing for multihop wireless ad hocnetworks has seen recent research interest to overcome

deficiencies of conventional routing [1]–[6] as applied inwireless setting. Motivated by classical routing solutions in theInternet, conventional routing in ad hoc networks attempts tofind a fixed path along which the packets are forwarded [7].Such fixed-path schemes fail to take advantage of broadcastnature and opportunities provided by the wireless medium andresult in unnecessary packet retransmissions. The opportunisticrouting decisions, in contrast, are made in an online mannerby choosing the next relay based on the actual transmissionoutcomes as well as a rank ordering of neighboring nodes. Op-portunistic routing mitigates the impact of poor wireless linksby exploiting the broadcast nature of wireless transmissionsand the path diversity.The authors in [1] and [6] provided a Markov decision the-

oretic formulation for opportunistic routing. In particular, it isshown that the optimal routing decision at any epoch is to se-lect the next relay node based on a distance-vector summa-rizing the expected-cost-to-forward from the neighbors to the

Manuscript received June 27, 2009; revised February 09, 2010; August 25,2010; and February 28, 2011; approved by IEEE/ACM TRANSACTIONS ONNETWORKING Editor C. Westphal; accepted May 17, 2011. Date of publicationJuly 18, 2011; date of current version February 15, 2012. This work wassupported in part by UC Discovery Grant #com07-10241, Ericsson, IntelCorporation, QUALCOMM, Inc., Texas Instruments, Inc., CWC at UCSD, andNSF CAREER Award CNS-0533035.The authors are with the Department of Electrical and Computer En-

gineering, University of California, San Diego, La Jolla, CA 92093 USA(e-mail: [email protected]; [email protected]; [email protected];[email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2011.2159844

destination. This “distance” is shown to be computable in adistributed manner and with low complexity using the proba-bilistic description of wireless links. The study in [1] and [6]provided a unifying framework for almost all versions of op-portunistic routing such as SDF [2], Geographic Random For-warding (GeRaF) [3], and ExOR [4], where the variations in[2]–[4] are due to the authors’ choices of cost measures to opti-mize. For instance, an optimal route in the context of ExOR [4]is computed so as to minimize the expected number of trans-missions (ETX), while GeRaF [3] uses the smallest geograph-ical distance from the destination as a criterion for selecting thenext-hop.The opportunistic algorithms proposed in [1]–[6] depend on

a precise probabilistic model of wireless connections and localtopology of the network. In a practical setting, however, theseprobabilistic models have to be “learned” and “maintained.” Inother words, a comprehensive study and evaluation of any op-portunistic routing scheme requires an integrated approach tothe issue of probability estimation. Authors in [8] provide a sen-sitivity analysis for the opportunistic routing algorithm given in[6]. However, by and large, the question of learning/estimatingchannel statistics in conjunction with opportunistic routing re-mains unexplored.In this paper, we first investigate the problem of opportunis-

tically routing packets in a wireless multihop network whenzero or erroneous knowledge of transmission success proba-bilities and network topology is available. Using a reinforce-ment learning framework, we propose a distributed adaptive op-portunistic routing algorithm (d-AdaptOR) that minimizes theexpected average per-packet cost for routing a packet from asource node to a destination. This is achieved by both suffi-ciently exploring the network using data packets and exploitingthe best routing opportunities.Our proposed reinforcement learning framework allows for

a low-complexity, low-overhead, distributed asynchronous im-plementation. The significant characteristics of d-AdaptOR arethat it is oblivious to the initial knowledge about the network, itis distributed, and it is asynchronous.The main contribution of this paper is to provide an oppor-

tunistic routing algorithm that: 1) assumes no knowledge aboutthe channel statistics and network, but 2) uses a reinforcementlearning framework in order to enable the nodes to adapt theirrouting strategies, and 3) optimally exploits the statistical op-portunities and receiver diversity. In doing so, we build on theMarkov decision formulation in [6] and an important theorem inQ-learning proved in [9]. There aremany learning-based routingsolutions (both heuristic or analytically driven) for conventionalrouting in wireless or wired networks [10]–[15]. None of these

1063-6692/$26.00 © 2011 IEEE

http://ieeexploreprojects.blogspot.com

Page 2: Adaptive opportunistic routing for wireless ad hoc networks.bak

244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

solutions exploits the receiver diversity gain in the context ofopportunistic routing. However, for the sake of completeness,we provide a brief overview of the existing approaches. Theauthors in [10]–[14] focus on heuristic routing algorithms thatadaptively identify the least congested path in a wired network.If the network congestion, hence delay, were to be replaced bytime-invariant quantities,1 the heuristics in [10]–[14] would be-come a special case of d-AdaptOR in a network with deter-ministic channels and with no receiver diversity. In this light,Theorem 1 in Section IV provides analytic guarantees for theheuristics obtained in [10]–[14]. In [15], analytic results for antrouting are obtained in wired networks without opportunism.Ant routing uses ant-like probes to find paths of optimal costssuch as expected hop count, expected delay, and packet lossprobability.2 This dependence on ant-like probing representsa stark difference with our approach where d-AdaptOR reliessolely on data packet for exploration.The rest of the paper is organized as follows. In Section II, we

discuss the systemmodel and formulate the problem. Section IIIformally introduces our proposed adaptive routing algorithm,d-AdaptOR. We then state and prove the optimality theorem ford-AdaptOR in Section IV. In Section V, we present the imple-mentation details and practical issues of d-AdaptOR. We per-form simulation study of d-AdaptOR in Section VI. Finally, weconclude the paper and discuss future work in Section VII.

II. SYSTEM MODEL

We consider the problem of routing packets from a sourcenode 0 to a destination node in a wireless ad hoc network of

nodes denoted by the set . The time isslotted and indexed by (this assumption is not technicallycritical and is only assumed for ease of exposition). A packetindexed by is generated at the source node 0 at timeaccording to an arbitrary distribution with rate .We assume a fixed transmission cost is incurred upon

a transmission from node . Transmission cost can be consid-ered to model the amount of energy used for transmission, theexpected time to transmit a given packet, or the hop count whenthe cost is set to unity. We consider an opportunistic routing set-ting with no duplicate copies of the packets. In other words, ata given time only one node is responsible for routing any givenpacket. Given a successful packet transmission from node tothe set of neighbor nodes , the next (possibly randomized)routing decision includes: 1) retransmission by node ; 2) re-laying the packet by a node ; or 3) dropping the packetaltogether. If node is selected as a relay, then it transmits thepacket at the next slot, while other nodes , expungethat packet.We define the termination event for packet to be the event

that packet is either received at the destination or is droppedby a relay before reaching the destination. We denote this ter-mination action by . We define termination time to be thestopping time when packet is terminated. We discriminate

1The delay and congestion are highly time-varying quantities.2Here, we note that unlike congestion or instantaneous delay, the expected

delay under a stable and stationary routing algorithm is indeed time-invariant,and allow for similar mathematically sound treatment.

among the termination events as follows. We assume that uponthe termination of a packet at the destination (successful de-livery of a packet to the destination), a fixed and given positivedelivery reward is obtained, while no reward is obtained ifthe packet is terminated before it reaches the destination. Letdenote this random reward obtained at the termination time ,i.e., either zero if the packet is dropped prior to reaching the des-tination node or if the packet is received at the destination.Let denote the index of the node which at time trans-

mits packet , and accordingly let denote the cost of trans-mission (equal to zero if at time packet is not transmitted).The routing scheme can be viewed as selecting a (random) se-quence of nodes for relaying packets .3 Assuch, the expected average per-packet reward associated withrouting packets along a sequence of up to time is

(1)

where denotes the number of packets terminated up totime and the expectation is taken over the events of transmis-sion decisions, successful packet receptions, and packet gener-ation times.Problem : Choose a sequence of relay nodes in

the absence of knowledge about the network topology such thatis maximized as .In Section III, we propose the d-AdaptOR algorithm, which

solves Problem . The nature of the algorithm allows nodes tomake routing decisions in distributed, asynchronous, and adap-tive manner.Remark 1: The problem of opportunistic routing for multiple

source–destination pairs, without loss of generality, can be de-composed to the single source–destination problem describedabove [Problem is solved for each distinct flow].

III. DISTRIBUTED ALGORITHM

Before we proceed with the description of d-AdaptOR, weprovide the following notations. Let denote the set ofneighbors of node including node itself. Let denote theset of potential reception outcomes due to a transmission fromnode , i.e., . We refer toas the state space for node ’s transmission. Furthermore, let

. Let denote the space of all al-lowable actions available to node upon successful reception atnodes in . Finally, for each node , we define a reward functionon states and potential decisions as

ifif andif but

A. Overview of d-AdaptOR

As discussed before, the routing decision at any given timeis made based on the reception outcome and involves retrans-mission, choosing the next relay, or termination. Our proposed

3Packets are indexed according to the termination order.

http://ieeexploreprojects.blogspot.com

Page 3: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 245

TABLE INOTATIONS USED IN THE DESCRIPTION OF THE ALGORITHM

schememakes such decisions in a distributedmanner via the fol-lowing three-way handshake between node and its neighbors

.1) At time , node transmits a packet.2) The set of nodes who have successfully received thepacket from node , transmit acknowledgment (ACK)packets to node . In addition to the node’s identity, theacknowledgment packet of node includes a controlmessage known as estimated best score (EBS) and denotedby .

3) Node announces node as the next transmitter orannounces the termination decision in a forwarding (FO)packet.

The routing decision of node at time is based on an adap-tive (stored) score vector . The score vectorlies in space , where , and is updatedby node using the EBS messages obtained from neigh-bors . Furthermore, node uses a set of counting vari-ables and and a sequence of positive scalars

to update its score vector at time . The counting vari-able is equal to the number of times neighborshave received (and acknowledged) the packets transmitted fromnode and routing decision has been made up totime . Similarly, is equal to the number of times theset of nodes has received (and acknowledged) packets trans-mitted from node up to time . Lastly, is a fixedsequence of numbers available at all nodes.Table I provides notations used in the description of the algo-

rithm, while Fig. 1 gives an overview of the components of thealgorithm. Next, we present further details.

B. Detailed Description of d-AdaptOR

The operation of d-AdaptOR can be described in terms ofinitialization and four stages of transmission, reception and ac-knowledgment, relay, and adaptive computation as shown inFig. 1. For simplicity of presentation, we assume a sequen-tial timing for each of the stages. We use to denote some

Fig. 1. Flow of the algorithm. The algorithm follows a four-stage procedure:transmission, acknowledgment, relay, and update.

(small) time after the start of th slot and to de-note some (small) time before the end of th slot such that

.0) Initialization:For all , initialize

, while.

1) Transmission Stage:Transmission stage occurs at time in which node trans-mits if it has a packet.2) Reception and acknowledgment Stage:Let denote the (random) set of nodes that have re-ceived the packet transmitted by node . In the receptionand acknowledgment stage, successful reception of thepacket transmitted by node is acknowledged to it byall the nodes in . We assume that the delay for theacknowledgment stage is small enough (not more than theduration of the time slot) such that node infers bytime .For all nodes , the ACK packet of node to nodeincludes the EBS message .Upon reception and acknowledgment, the countingrandom variable is incremented as follows:

ifif

3) Relay Stage:Node selects a routing action accordingto the following (randomized) rule parameterized by

.• With probability

is selected.4

4In case of ambiguity, node with the smallest index is chosen.

http://ieeexploreprojects.blogspot.com

Page 4: Adaptive opportunistic routing for wireless ad hoc networks.bak

246 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

• With probability

is selected uniformly with probability .Node transmits FO, a control packet that contains infor-mation about routing decision at some time strictly be-tween and . If , then node pre-pares for forwarding in the next time slot, while nodes

expunge the packet. If termination ac-tion is chosen, i.e., , all nodes in expunge thepacket.Upon selection of routing action, the counting variableis updated

ifif

4) Adaptive Computation Stage:At time , after being done with transmission andrelaying, node updates score vector as follows.• For

(2)

• Otherwise

(3)

Furthermore, node updates its EBS message forfuture acknowledgments as

C. Computational Issues

The computational complexity and control overhead ofd-AdaptOR is low.1) Complexity: To execute stochastic recursion (2), the

number of computations required per packet is order ofat each time slot. The space complexity

of d-AdaptOR is exponential in the number of neighbors, i.e.,for each node. The reduction in storage

requirement using approximation techniques in [16] is left asfuture work.2) Control Overhead: The number of acknowledgments per

packet is order of , independent of networksize.3) Exploration Overhead: The adaptation to the optimal per-

formance in the network is guaranteed via a controlled random-ized routing strategy that can be viewed as cost of exploration.The cost of exploration is proportional to the total number ofpackets whose routes deviates from the optimal path. In proof ofTheorem 1, we show that this cost increases sublinearly with thenumber of delivered packets, hence the per-packet explorationcost diminishes as the number of delivered packets grows. Addi-tionally, communication of adds a very modest overhead

to the genie-aided or greedy-based schemes such as ExOR orSR.

IV. ANALYTIC OPTIMALITY OF D-ADAPTOR

We will now state the main result establishing the optimalityof the proposed d-AdaptOR algorithm under the assumptions ofa time-invariant model of packet reception and reliable controlpackets. More precisely, we have the following assumptions.Assumption 1: The probability of successful reception of a

packet transmitted by node at set of nodes is ,independent of time and all other routing decisions.The probabilities in Assumption 1 characterize a

packet reception model that we refer to as local broadcastmodel. Note that for all , successful reception atand are mutually exclusive and . Fur-thermore, logically node is always a recipient of its owntransmission, i.e., iff .Assumption 2: The successful reception at set due to trans-

mission from node is acknowledged perfectly to node .Remark 2: Assumption 1 is in line with the experimen-

tally tested state of the art routing protocols MORE [17] andExOR [4]. These studies seem to indicate that reasonablysimple probabilistic models provide good abstractions of mediaaccess control (MAC) and physical (PHY) layers at the routinglayer.Remark 3: In practice, Assumption 2 is hard to satisfy. But

as we will see in Section VI, when the rates and power of thecontrol packets are set to maximize the reliability, the impact ofviolating this assumption can be kept extremely low.Remark 4: In Section VI, we address the severity as well as

the implications of Assumptions 1 and 2. In particular, via a setof QualNet simulations, we will show that d-AdaptOR exhibitsmany of its desirable properties in a realistic setup despite therelaxation of the analytical assumptions.Given Assumptions 1 and 2, we are almost ready to

present Theorem 1 regarding the optimality of d-AdaptORamong the class of policies that are oblivious to the net-work topology and/or channel statistics. More precisely, leta distributed routing policy be a collectionof routing decisions taken at nodes , where de-notes a sequence of random actions fornode . The policy is said to be (P)-admissible if for allnodes , the event belongsto the -field generated by the observations at node ,i.e., . Let denote theset of such -admissible policies. Theorem 1 states thatd-AdaptOR, denoted by , is an optimal -admissiblepolicy.Theorem 1: Suppose and

Assumptions 1 and 2 hold. Then, for all

http://ieeexploreprojects.blogspot.com

Page 5: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 247

where and are the expectations taken with respect topolicies and , respectively.5

Next, we prove the optimality of d-AdaptOR in two steps.In the first step, we show that converges in an almost suresense. In the second step, we use this convergence result to showthat d-AdaptOR is optimal for Problem .

A. Convergence of

Let be an operator on vector suchthat

(4)

Let denote the fixed point of operator ,6 i.e.,

(5)

The following lemma establishes the convergence of recur-sion (2) to the fixed point of .Lemma 1: Let:J1) for all ;J2) .

Then, the sequence obtained by the stochastic recursion (2)converges to almost surely.The proof uses known results on the convergence of a

certain recursive stochastic process as presented by Fact 2 inAppendix-A.

B. Proof of Optimality

Using the convergence of , we show that the expected av-erage per-packet reward under d-AdaptOR is equal to the op-timal expected average per-packet reward obtained for a genie-aided system where the local broadcast model is known per-fectly. In other words, we take cue from known results asso-ciated with a closely related Auxiliary Problem ( ). In thisAuxiliary Problem ( ), there exists a centralized controllerwith full knowledge of the local broadcast model as wellas the transmission outcomes across the network [1], [6]. Theobjective in the Auxiliary Problem ( ) is a single-packet vari-ation of that in Problem ( ): the reward

for routing a single packet from the source to the destinationis maximized over a set of (AP)-admissible policies, wherethis set of (AP)-admissible policies is a superset of (P)-ad-missible policies that also includes all topology-aware and

5This is a strong notion of optimality and implies that the proposed algo-rithm’s expected average reward is greater than the best-case performance

of all policies [18, p. 344].6Existence and uniqueness of is provided in Appendix-A.

centralized policies. This Auxiliary Problem ( ) has been ex-tensively studied in [1], [6], and [19], where a Markov decisionformulation provides the following important result.Fact 1 [6, Theorem 2.1]: Consider the unique solution

to the following fixed-point equation:

(6)

(7)

There exists an optimal topology-aware and centralized admis-sible policy such that

(8)

Lemma 2 states the relationship between the solution ofProblem ( ) and that of the Auxiliary Problem ( ). Morespecifically, Lemma 2 shows that is an upper bound forthe solution to Problem ( ).Lemma 2: For any (P)-admissible policy for

Problem ( ) and for all

The proof is given in Appendix-B. Intuitively, the result holdsbecause the set of (P)-admissible policies is a subset of (AP)-admissible policies, i.e., .Lemma 3 gives the achievability proof by showing that the

expected average per-packet reward of d-AdaptOR is lower-bounded by .Lemma 3: For any

The proof is given in Appendix-C. Lemmas 2 and 3 implythat [which is (P)-admissible by construction] is an optimalpolicy under which

exists and is equal to establishing the proof of Theorem 1.Corollary 1: When , the network is connected,

and is greater than the worst-case routing cost,7 d-AdaptORminimizes

(9)

the expected per-packet delivery time as .

7The worst-case routing cost can be determined by taking supremum overETX metrics for all source–destination pairs.

http://ieeexploreprojects.blogspot.com

Page 6: Adaptive opportunistic routing for wireless ad hoc networks.bak

248 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

This is because when is sufficiently large, and thenetwork is connected

V. PROTOCOL DESIGN AND IMPLEMENTATION ISSUES

In this section, we describe an 802.11 compatible implemen-tation for d-AdaptOR.

A. 802.11 Compatible Implementation

The implementation of d-AdaptOR, analogous to any oppor-tunistic routing scheme, involves the selection of a relay nodeamong the candidate set of nodes that have received and ac-knowledged a packet successfully. One of the major challengesin the implementation of an opportunistic routing algorithm ingeneral, and the d-AdaptOR algorithm in particular, is the de-sign of an 802.11 compatible acknowledgment mechanism atthe MAC layer. We propose a practical and simple way to im-plement acknowledgment architecture.The transmission at any node is done according to an 802.11

CSMA/CA mechanism. Specially, before any transmission,transmitter performs channel sensing and starts transmissionafter the backoff counter is decremented to zero. For eachneighbor node , the transmitter node then reservesa virtual time slot of duration , where isthe duration of the acknowledgment packet and is theduration of Short InterFrame Space (SIFS) [20]. Transmitterthen piggybacks a priority ordering of nodes with eachdata packet transmitted. The priority ordering determines thevirtual time slot in which the candidate nodes transmit theiracknowledgment. Nodes in the set that have successfullyreceived the packet then transmit acknowledgment packetssequentially in the order determined by the transmitter node.After a waiting time of during

which each node in the set has had a chance to send an ACK,node transmits a FOrwarding control packet (FO). The FOpackets contain the identity of the next forwarder, which maybe node again or any node . If expires and noFO packet is received (FO packet reception is unsuccessful),then the corresponding candidate nodes drop the received datapacket. If the transmitter does not receive any acknowledg-ment, node retransmits the packet. The backoff window isdoubled after every retransmission. Furthermore, the packet isdropped if the retry limit (set to 7) is reached.In addition to the acknowledgment scheme, d-AdaptOR

requires modifications to the 802.11 MAC frame format.Fig. 2 shows the modified MAC frame formats required byd-AdaptOR. The reserved bits in the type/subtype fields of theframe control field of the 802.11 MAC specification are usedto indicate whether the rest of the frame is a d-AdaptOR dataframe, a d-AdaptOR ACK, or a, FO.8 The data frame contains

8This enables the d-AdaptOR to communicate and be fully compatible withother 802.11 devices.

Fig. 2. Frame structure of the data packets, acknowledgment packets, and FOpackets.

the candidate set in priority order, the payload, and the 802.11Frame Check Sequence. The acknowledgment frame includesthe data frame sender’s address and the feedback EBS .The FO packet is exactly the same as a standard 802.11 shortcontrol frame that uses different subtype value.

B. d-AdaptOR in a Realistic Setting

1) Loss of ACK and FO Packets: Interference or lowsignal-to-noise ratio (SNR) can cause loss of ACK and FOpackets. Loss of an ACK packet results in an incorrect estima-tion of nodes that have received the packet, and thus affects theperformance of the algorithm. Loss of FO packet negativelyimpacts the throughput performance of the network. In partic-ular, loss of an FO packet can result in the drop of data packetsat all the potential relays, reducing the throughput performance.Hence, in our design, FO packets are transmitted at lower ratesto ensure a reliable transmission.2) Increased Overhead: As it is the case with any oppor-

tunistic scheme, d-AdaptOR adds a modest additional overheadto the standard 802.11 due to the added acknowledgment/hand-shake structure. This overhead increases linearly with thenumber of neighbors. Assuming a 802.11b physical layer oper-ating at 11 Mb/s with an SIFS time of 10 s, preamble durationof 20 s, Physical Layer Convergence Protocol (PLCP) headerduration of 4 s, and 512-B frame payloads, Table II comparesthe overhead in the data packet due to piggybacking and thecontrol overhead due to ACK and FO packets for unicast802.11, genie-aided opportunistic scheme, and d-AdaptOR.d-AdaptOR requires communication overhead of 4 extra bytes(for EBS) per ACK packet compared to the genie-aided op-portunistic scheme, while unicast 802.11 does not require suchoverhead.Note that the overhead cost can be reduced by restricting

the number of nodes in the candidate list of MAC header toa given number, MAX-NEIGHBOUR. The unique orderingfor the nodes in the candidate set is determined by prioritizingthe nodes with respect to and then

http://ieeexploreprojects.blogspot.com

Page 7: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 249

TABLE IIOVERHEAD COMPARISONS

choosing the MAX-NEIGHBOUR highest priority nodes.9

Such a limitation will sacrifice the diversity gain and, hence,the performance of any opportunistic routing algorithm forlower overhead. In practice, we have seen that limiting theneighbor set to 4 provides most of the diversity gain.

VI. SIMULATIONS

In this section, we provide simulation studies in realistic wire-less settings where the theoretical assumptions of our study donot hold. These simulations not only demonstrate a robust per-formance gain under d-AdaptOR in a realistic network, but alsoprovide significant insight in the appropriate choice of the de-sign parameters such as damping sequence , delivery re-ward , etc. We first investigate the performance of d-AdaptORwith respect to the design parameters and network parameters ina grid topology of 16 nodes. We then use a realistic topology of36 nodes with random placement to demonstrate robustness ofd-Adaptor to the violation of the analytic Assumptions 1 and 2.

A. Simulation Setup

In Sections VI-B and VI-C, using the appropriate choiceof the design parameters, we compare the performance ofd-AdaptOR against suitably chosen candidates. As a bench-mark, when appropriate, we have compared the performanceagainst a genie-aided policy that relies on full network topologyinformation when selecting routes. This is nothing but dis-cussed in Section IV-B. We also compare against StochasticRouting (SR) [1] (SR is the distributed implementation ofpolicy ) and ExOR [4] (an opportunistic routing policy withETX metric) in which the empirical probabilistic structureof the network is used to implement opportunistic routingalgorithms. As a result, their performance will be highly de-pendent on the precision of empirical probability associatedwith link . To provide a fair comparison, we have consideredsimple greedy versions of SR and ExOR. These algorithmsadapt to the history of packet reception outcomes and relyon the updates to make routing decisions assuming error-free

. We have also compared our performance against a con-ventional routing SRCR [21] with full knowledge of topology.In this setting, a conventional route is selected with perfectknowledge of link success probability at any given node. Thiscomparison in effect provides a simple benchmark for alllearning-based conventional routing policies in the literaturesuch as Q-routing [10] and predictive Q-routing [12] whencongestion is taken to be small enough (such that finding leastcongested paths coincides with finding the path with minimumexpected number of transmissions).

9In case of ambiguity, the node with the smallest index is chosen.

Our simulations are performed in QualNet. We consider twosets of topologies in our experimental study.1) Grid Topology: In Section VI-B, we study a grid topologyconsisting of 16 indoor nodes such that the nearest neigh-bors are separated by distance meters. If unspecified,is chosen to be 25 m. The source and the destination arechosen at the maximal distance (on diagonal) from eachother.

2) Random Topology: In Section VI-C, we study a randomtopology consisting of 36 indoor nodes placed in an areaof 150 150 m . Here, we investigate the performanceunder a multisource multidestination setting as the numberof flows in the network is varied and each flow is specifiedvia a randomly selected pair of source and destination.

The nodes are equipped with 802.11b radios placed in indoorenvironment transmitting at 11 Mb/s with transmission power15 dBm. Note that the choice of indoor environment is mo-tivated by the findings in [22], where opportunistic routingis found to provide significant diversity gains. The wirelessmedium model includes Rician fading with K-factor of 4and log-normal shadowing with mean 4 dB. The path lossfollows the two-ray model in [23] with path exponent of 3.The acknowledgment packets are short packets of length 24 Btransmitted at 11 Mb/s, while FO packets are of length 20 Band transmitted at a lower rate of 1 Mb/s to ensure reliability.If unspecified, packets are generated according to a constantbit rate (CBR) source with rate 20 packets/s. The packets areassumed to be of length 512 B equipped with simple cyclicredundancy check (CRC) error detection. The cost of transmis-sion is assumed to be one unit, and the reward is set to 40.We have chosen as the exploration parameter ofchoice.

B. Effects of Design and Network Parameters

Here, we investigate the role and criticality of various designparameters of d-AdaptOR with respect to the expected numberof transmission criterion. Let us start with design parameters

and1) Exploration Parameter Sequence : The convergence

rate of stochastic recursion (2) depends strongly on the choice ofsequence . Convergence is slower with a faster decreasingsequence and results in less variance in the estimates of, while with a slow decreasing sequence of , conver-

gence is fast but results in large variance in the estimates of. In Fig. 3, we have plotted the effect of the choice of

sequence by comparing two sequences and

. Note that under sequence ,d-AdaptOR is slower to adapt to the optimal performance whileit shows a slightly smaller variance. This is because the choiceof controls the rate with which greedy versus (randomlychosen) exploration actions are utilized. The optimization of thechoice of is an interesting topic of study in stochastic ap-proximation [24], [25], far beyond the scope of this work.2) Per-Packet Delivery Reward : To ensure an acceptable

performance of d-AdaptOR, the value of delivery reward, ,must be chosen sufficiently high. This would ensure the exis-tence of routes under which the value of delivering a packet(as represented in ) is worth (i.e., larger than) the cost of

http://ieeexploreprojects.blogspot.com

Page 8: Adaptive opportunistic routing for wireless ad hoc networks.bak

250 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

Fig. 3. Comparison for .

Fig. 4. Expected number of transmissions versus time as is varied.

relaying and routing that packet. A reasonable choice of isany value larger than the worst-case expected transmission cost.Increasing beyond such a value does not affect the asymptoticoptimality of the algorithm. Next, we study the performance ofd-AdaptOR with respect to the convergence rate and deliveryratio.Fig. 4 plots the expected number of transmissions rate as

time progresses for various values of . As seen in Fig. 4, ifincreases beyond a threshold (in the example provided

here, this threshold is 18, but in general it depends on thenetwork diameter), the expected number of transmissionsper packet achieve the optimal value of . In contrast, for

, the expected number of transmissions approacheszero as the packets not worth obtaining routing reward aredropped.10 Fig. 4 also shows that the convergence rate of theexpected number of transmissions for routing per packet underd-AdaptOR decreases as increases. The slow convergence for

for large is due to the flexibility of exploring longerpaths. The slow convergence to zero for near isattributed to the fact that it takes a longer time for d-AdaptORto realize that the packet is not worth relaying.Fig. 5 plots the delivery ratio as is varied. Fig. 5 shows that

as increases beyond a threshold , the delivery ratio remainsfixed. However, for sufficiently small , nearly all the packetsare dropped as the cost of transmission of the packet as well asrelaying is not worth the obtained delivery reward. Due to very

10For , we have plotted negative of the expected per-packet rewardas the expected number of transmissions.

Fig. 5. Delivery ratio as is varied.

Fig. 6. d-AdaptOR performance as packet length is varied.

Fig. 7. Performance of d-AdaptOR as CBR traffic is varied.

slow convergence rate around for , we observe thata nonnegligible number of packets is delivered in the durationof experiment.Next, we investigate the performance of d-AdaptOR with

respect to other candidate protocols for the network parame-ters such as packet length, traffic rate, neighbor distance, andtime-varying costs.3) Packet Length: We have repeated our simulations for

1024-B packets. Fig. 6 plots the performance as the packetlength is varied from 512 to 1024 B. Note that due to the de-creasing packet transmission reliabilities, the expected routingcost per packet is increased with the packet size. However, theoptimality of d-AdaptOR does not depend on the packet length.4) Traffic Rate: Fig. 7 plots the mean number of transmis-

sions versus CBR rate for candidate algorithms. Even thoughthe performance gain for d-AdaptOR decreases somewhat withincrease in the load, there is always a nonnegligible advantageover greedy solutions.5) Average Hop Length : In an attempt to understand the

performance gap between various opportunistic algorithms,specifically the gap between d-AdaptOR versus learning-basedconventional routing algorithms [10]–[13] whose performanceis bounded by SRCR, one needs to gain insight about the diver-sity gain achieved by opportunistic routing. Fig. 8 compares theexpected transmission cost for the three opportunistic routingalgorithms (d-AdaptOR, ExOR, and SR) and SRCR as the

http://ieeexploreprojects.blogspot.com

Page 9: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 251

Fig. 8. Small hops provide significant receiver diversity gain.

Fig. 9. Time-varying cost: Nodes go into sleep mode at time 300 s.

distance between the neighboring nodes in the grid topology,measured in meters, is varied from 10 to 30 m. Note that forhigh values of , the receiver diversity is low due to retrans-mission packet losses giving nearly similar performance forcandidate protocols, while small corresponds to a networkwith large receiver diversity gain. As expected, when issmall, all opportunistic routing schemes provide a significantimprovement over conventional routing, but perhaps what ismore interesting is the performance gain of learning-basedd-AdaptOR over the greedy-based solutions in medium ranges.6) Time-Varying Cost: In our analytical setup, we assume

the transmission costs are fixed. Next, we discuss a simple sce-nario where the nodes have time-varying transmission costs.Consider a network in which nodes may go into an energy-saving mode when they do not participate in routing (e.g., torecharge their energy sources). Assume that upon entering theenergy-saving mode, a node announces a high cost of trans-mission (100 instead of usual transmission cost of 1). Fig. 9plots the expected average cost of d-AdaptOR when two nodesat the center of the grid move into an energy-saving mode. Itshows that d-AdaptOR can track the genie-aided solution afterthe nodes move into the energy-saving mode.

C. Case Study: Random Network

Here, we study a random network scenario consisting of36 wireless nodes placed randomly, with the remaining param-eters kept the same as the default parameters.Fig. 10 plots the expected number of transmissions and the

expected average per-packet reward for the candidate routingalgorithms versus network operation time when a single flowis present in the random topology. We first note that, as ex-pected, SRCR performs poorly compared to the opportunisticschemes as it fails to utilize the receiver diversity gain. Thisunderlines our contribution over all existing learning-based so-lutions [10]–[13] that ignore receiver diversity. Furthermore,

Fig. 10. Expected number of transmissions and average per-packet reward asfunction of operation time.

Fig. 11. d-AdaptOR versus distributed SR, ExOR, and SRCR performance formultiple flows.

Fig. 10 shows that the d-AdaptOR algorithm outperforms thegreedy opportunistic schemes given sufficient number of packetdeliveries. This is because the greedy versions of SR and ExORfail to explore possible choices of routes and often result instrictly suboptimal routing policies. Fig. 10 also shows that therandomized routing decisions employed by d-AdaptOR work asa double-edged sword. On the one hand, they form amechanismthrough which network opportunities are exhaustively exploreduntil the globally optimal decisions are constructed, resultingin an improved long-term performance while these randomizeddecisions lead to a short-term performance loss. This, in fact, isreminiscent of the well-known exploration/exploitation tradeoffin stochastic control and learning literature.Next, we study the performance of d-AdaptOR as the number

of flows in the network is varied, where each flow is specifiedvia a randomly selected pair of source and destination. Fig. 11plots the expected number of transmissions and expectedaverage reward for the candidate routing algorithms for therandom topology. As seen in Fig. 11, d-AdaptOR maintains an

http://ieeexploreprojects.blogspot.com

Page 10: Adaptive opportunistic routing for wireless ad hoc networks.bak

252 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

optimal performance. However, Fig. 11 also shows that the gapbetween d-AdaptOR and the greedy version of SR significantlydecreases with an increase in number of flows where the naturalpattern of traffic flow renders the (randomized) explorationphase less critical. In other words, while Fig. 11 is consistentwith the Remark 1 in Section II regarding the decompositionof multiple-flow scenario to multiple single-flow scenarios, italso suggests that a joint design in which the multiplicity offlows provide a natural (and greedy) exploration of the networkmight be beneficial with regard to the transient/short-termperformance measures of interest.

VII. CONCLUSION AND FUTURE WORK

In this paper, we proposed d-AdaptOR, a distributed, adap-tive, and opportunistic routing algorithm whose performance isshown to be optimal with zero knowledge regarding networktopology and channel statistics. More precisely, under idealizedassumptions, d-AdaptOR is shown to achieve the performanceof an optimal routing with perfect and centralized knowledgeabout network topology, where the performance is measuredin terms of the expected per-packet reward. Furthermore,we show that d-AdaptOR allows for a practical distributedand asynchronous 802.11 compatible implementation, whoseperformance was investigated via a detailed set of QualNetsimulations under practical and realistic networks. Simulationsshow that d-AdaptOR consistently outperforms existing adap-tive routing algorithms in practical settings.The long-term average reward criterion investigated in this

paper inherently ignores the short-term performance. To cap-ture the performance of various adaptive schemes, however, itis desirable to study the performance of the algorithms over afinite horizon. One popular way to study this is via measuringthe incurred “regret” over a finite horizon. Regret is a functionof horizon that quantifies the loss of the performance undera given adaptive algorithm relative to the performance of thetopology-aware optimal one. More specifically, our results sofar implies that the optimal rate of growth of regret is strictlysublinear in , but fails to provide a conclusive understandingof the short-term behavior of d-AdaptOR. An important area offuture work comprises developing adaptive algorithms that en-sure optimal growth rate of regret.The design of routing protocols requires a consideration

of congestion control along with the throughput perfor-mance [26], [27]. Our work, however, does not consider thisclosely related issue. Incorporating congestion control in op-portunistic routing algorithms to minimize expected delaywithout the topology and the channel statistics knowledge is anarea of future research.

APPENDIX

We start this section with a note on the notations used. On theprobability space , we use notation todenote the indicator random variable (with respect to ), suchthat for all for all , and

for all . For a vector ,we use to denote the th element of the vector. Letdenote the weighted max-norm with positive weight vector ,i.e., . We denote the vector in with all

components equal to 1 by . We also use the notation torepresent the first random elements of the random sequence

.

A. Proof of Lemma 1

Lemma 1: Let:J1) for all ;J2) .

Then, the sequence obtained by the stochastic recursion (2)

converges to almost surely.To prove Lemma 1, we note that the adaptive computation

given by (2) utilizes a stochastic approximation algorithm tosolve the MDP associated with Problem ( ). To study theconvergence properties of this stochastic approximation, we ap-peal to known results in the intersection of learning and sto-chastic approximation given below.In particular, consider a set of stochastic sequences on, denoted by , and the corresponding

filtration , i.e., the increasing -field generated by, satisfying the following recursive equation:

where is a mapping from into and

, is a vector of possibly delayed componentsof . If no information is outdated, then for all and

. The following important result on the convergenceof is provided in [9].Fact 2 [9, Theorem 2]: Assume and sat-

isfy the following conditions.G1) For all and a.s.;

for a.s.;for a.s.

G2) is a martingale difference with finite secondmoment, i.e., , and there existconstants and such that

.G3) There exists a positive vector , scalars and

, such that

G4) Mapping satisfies the followingproperties.1) is componentwise monotonically increasing.2) is continuous.3) has a unique fixed point .4) ,for any .

G5) For any as .Then, the sequence of random vectors converges to the fixedpoint almost surely.Let be the increasing -field generated by random vec-

tors . Let be the random vector of

http://ieeexploreprojects.blogspot.com

Page 11: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 253

dimension , generated via recursiveequation (2). Furthermore

Let be a random vector whose th elementis constructed as follows:

where , and is the most recent state visited bynode .Now, we can rewrite (2) and (3) as in the form investigated

in Fact 2, i.e.,

The remaining steps of the proof reduce to verifying state-ments G1–G5. This is verified in Lemma 4.Lemma 4: satisfy conditions G1–G5.Proof:

• (G1): It is shown in Lemma 6 that algorithm d-AdaptORguarantees that every state-action is attempted infinitelyoften (i.o.). Hence

visited i.o.

However

• (G2):

Thus Assumption (G2) of Fact 2 is satisfied.• (G3): Let denote the setof states that contain the destination node . Moreover, let

. Let be the hittingtime associated with set and policy , i.e.,

. Policy is saidto be proper if . Let us now fix aproper deterministic stationary policy . Existence ofsuch a policy is guaranteed from the connectivity between0 and . Let be the termination state that is reachedafter taking the termination action . Let us define a policydependent operator

(10)We then consider a Markov chain with states andwith the following dynamics: From any state , wemove to state , with probability .Thus, subsequent to the first transition, we are always at astate of the form , and the first two componentsof the state evolve according to policy . As is assumedproper, it follows that the system with states alsoevolves according to a proper policy. We construct a ma-trix with each entry corresponding to the transition fromstate to with value equal tofor all for all .Since policy is proper, the maximum eigenvalue of ma-trix is strictly less than 1. As is a nonnegative ma-trix, Perron Frobenius theorem guarantees the existence ofa positive vector with components and some

such that

(11)

From (11), we have a positive vector such that, where is the

fixed point of equation .From the definition of (4) and (10), we have

. Using this and the tri-angle inequality, we obtain

establishing the validity of (G3).• (G4): Assumption (G4) is satisfied by operator using thefollowing fact:Fact 3 [19, Proposition 4.3.1]: is monotonically in-creasing, continuous, and satisfies

http://ieeexploreprojects.blogspot.com

Page 12: Adaptive opportunistic routing for wireless ad hoc networks.bak

254 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

. is a fixed pointof . From (5) and (6), we obtain

(12)

Furthermore, using (5) and (12), for all

(13)

The existence of fixed point follows from (13), whileuniqueness of follows from uniqueness of (Fact 1).

• (G5): Suppose as . Therefore, there existssuch that for all . This means that the number

of times that node has transmitted a packet is boundedby . However, this contradicts Lemma 6, which says thateach state-action pair is visited i.o. Therefore,as for all , and condition (G5) holds.

Thus, Assumptions (G1)–(G5) are satisfied. Hence, fromFact 2, our iterate (2) converges almost surely to , the uniquefixed point of .Lemma 5: If policy is followed, then action is

selected i.o. if state is visited i.o.Proof: Define the random variable

for any . Let be the -field generated by. Let for any

.From the construction of the algorithm, it is clear that ismeasurable. Now, it is clear that under policy is

independent of given and . Define

for allifif (14)

for all

is visited i.o.

is visited i.o.

is visited i.o.(15)

The next step of the proof is based on the following fact.Fact 4 [28, Corollary 5.29] (Extended Borel–Cantelli

Lemma): Let be an increasing sequence of -fields and letbe -measurable. If , then

.Thus, from Fact 4, is visited i.o. if is visited i.o.

Lemma 6: If policy is followed, then each state-actionis visited infinitely often.Proof: We say states communicate if there ex-

ists a sequence of actions such that proba-bility of reaching state from state following the sequenceof actions is greater than zero. Using Lemma 5,if state is visited i.o., then every actionis chosen i.o. as the set is finite. Hence, states suchthat , are visited i.o. if is visited i.o.By Lemma 5, every action is also visited i.o. Fol-lowing similar argument and repeated application of Lemma 5,every state that communicates with state and actions

is visited i.o.Under the assumption of the packet generation process in

Section II, a packet is generated i.o. at the source node 0. Thus,state is reached i.o. The construction of set is such thatevery state communicates with state . Thus, each

is visited i.o. since is finite.

B. Proof of Lemma 2

Lemma 2: For any (P)-admissible policy forProblem ( ) and for all

Proof: To prove the lemma, we refer to the Auxil-iary Problem ( ). In this problem we have assumed theexistence of a centralized controller with full knowledgeof the local broadcast model. Mathematically speaking,let be the sample space of the random probabilitymeasures for the local broadcast model. Specifically,

is a nonsquare left stochastic matrix .Moreover, let be the trivial -field generated by thelocal broadcast model (sample point in ), i.e.,

.11 Recall that denotes the set of nodesthat have received the packet due to transmission from nodeat time , while denotes the corresponding routing decisionnode takes at time .12 For Auxiliary Problem ( ), a routingpolicy is a collection of routing decisions takenfor nodes at the centralized controller, where denotesa sequence of random actions for node .The routing policy is said to be (AP)-admissible for AuxiliaryProblem ( ) if the event belongs to the product-field [29].From Fact 1, since is the optimal policy for one packet,

for each packet and for any feasible policy

11 -field captures the knowledge of the realization of local broadcast modeland assumes a well-defined prior on these models.12 if node does not transmit at time .

http://ieeexploreprojects.blogspot.com

Page 13: Adaptive opportunistic routing for wireless ad hoc networks.bak

BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 255

where the inequality follows from the fact that . Theremaining steps are straightforward

C. Proof of Lemma 3

Lemma 3: For any

Proof: From (5), (6), and (12), we obtain the followingequality for all :

(16)

Let

Lemma 1 implies that, in an almost sure sense, there existspacket index such that for all

In other words, from time onwards, given any nodeand set , the probability that d-AdaptOR chooses anaction such thatis upper-bounded by . Furthermore, since(Lemma 6), for a given , with probability 1, there

exists a packet index such that for all.

Let . For all packets with index, the overall expected reward is upper-bounded byand lower-bounded by , hence their

presence does not impact the expected average per-packet re-ward. Consequently, we only need to consider the routing deci-sions of policy for packets .Consider the th packet generated at the source. Let

be an event for which there exist instances when d-AdaptORroutes packet differently from the possible set of optimal ac-tions. Mathematically speaking, event occurs iff there existinstances such that for all

where is the set of nodes that have successfully receivedpacket at time due to transmission from node . Wecall event a misrouting of order . For

Now for packets , let us consider the expected dif-ferential reward under policies and

(17)

(18)

(19)

where . Inequality (17) is obtained by noticing thatmaximum loss in the reward occurs if algorithm d-AdaptORdecides to drop packet (no reward) while there exists a nodein the set of potential forwarders such that .Thus, for all , the expected average per-packet reward

under policy is bounded as

ACKNOWLEDGMENT

The authors would like to thank A. Plymoth and P. Johanssonfor the valuable discussions. They are grateful to the anonymousreviewers who provided thoughtful comments and constructivecritique of the paper.

REFERENCES

[1] C. Lott and D. Teneketzis, “Stochastic routing in ad hoc wireless net-works,” in Proc. 39th IEEE Conf. Decision Control, 2000, vol. 3, pp.2302–2307, vol. 3.

[2] P. Larsson, “Selection diversity forwarding in a multihop packet radionetwork with fading channel and capture,” Mobile Comput. Commun.Rev., vol. 2, no. 4, pp. 47–54, Oct. 2001.

[3] M. Zorzi and R. R. Rao, “Geographic random forwarding (GeRaF) forad hoc and sensor networks: Multihop performance,” IEEE Trans. Mo-bile Comput., vol. 2, no. 4, pp. 337–348, Oct.–Dec. 2003.

http://ieeexploreprojects.blogspot.com

Page 14: Adaptive opportunistic routing for wireless ad hoc networks.bak

256 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012

[4] S. Biswas and R. Morris, “ExOR: Opportunistic multi-hop routing forwireless networks,” Comput. Commun. Rev., vol. 35, pp. 33–44, Oct.2005.

[5] S. Jain and S. R. Das, “Exploiting path diversity in the link layer inwireless ad hoc networks,” in Proc. 6th IEEE WoWMoM, Jun. 2005,pp. 22–30.

[6] C. Lott and D. Teneketzis, “Stochastic routing in ad hoc networks,”IEEE Trans. Autom. Control, vol. 51, no. 1, pp. 52–72, Jan. 2006.

[7] E. M. Royer and C. K. Toh, “A review of current routing protocols forad hoc mobile wireless networks,” IEEE Pers. Commun., vol. 6, no. 2,pp. 46–55, Apr. 1999.

[8] T. Javidi and D. Teneketzis, “Sensitivity analysis for optimal routingin wireless ad hoc networks in presence of error in channel quality es-timation,” IEEE Trans. Autom. Control, vol. 49, no. 8, pp. 1303–1316,Aug. 2004.

[9] J. N. Tsitsiklis, “Asynchronous stochastic approximation andQ-learning,” in Proc. 32nd IEEE Conf. Decision Control, Dec.1993, vol. 1, pp. 395–400.

[10] J. Boyan and M. Littman, “Packet routing in dynamically changingnetworks: A reinforcement learning approach,” in Proc. NIPS, 1994,pp. 671–678.

[11] J. W. Bates, “Packet routing and reinforcement learning: Estimatingshortest paths in dynamic graphs,” 1995, unpublished.

[12] S. Choi and D. Yeung, “Predictive Q-routing: A memory-based rein-forcement learning approach to adaptive traffic control,” in Proc. NIPS,1996, pp. 945–951.

[13] S. Kumar and R. Miikkulainen, “Dual reinforcement Q-routing: Anon-line adaptive routing algorithm,” in Proc. Smart Eng. Syst., NeuralNetw., Fuzzy Logic, Data Mining, Evol. Program., 2000, pp. 231–238.

[14] S. S. Dhillon and P. Van Mieghem, “Performance analysis of theAntNet algorithm,” Comput. Netw., vol. 51, no. 8, pp. 2104–2125,2007.

[15] P. Purkayastha and J. S. Baras, “Convergence of Ant routing algorithmvia stochastic approximation and optimization,” in Proc. IEEE Conf.Decision Control, 2007, pp. 340–354.

[16] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming.Belmont, MA: Athena Scientific, 1996.

[17] S. Chachulski, M. Jennings, S. Katti, and D. Katabi, “Trading structurefor randomness in wireless opportunistic routing,” in Proc. ACM SIG-COMM, 2007, pp. 169–180.

[18] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dy-namic Programming. New York: Wiley, 1994.

[19] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu-tation: Numerical Methods. Belmont, MA: Athena Scientific, 1997.

[20] W. Stallings, Wireless Communications and Networks, 2nd ed. UpperSaddle River, NJ: Prentice-Hall, 2004.

[21] J. Bicket, D. Aguayo, S. Biswas, and R.Morris, “Architecture and eval-uation of an unplanned 802.11b mesh network,” in Proc. ACM Mo-biCom, Cologne, Germany, 2005, pp. 31–42.

[22] M. Kurth, A. Zubow, and J. P. Redlich, “Cooperative opportunisticrouting using transmit diversity in wireless mesh networks,” in Proc.IEEE INFOCOM, Apr. 2008, pp. 1310–1318.

[23] J. Doble, Introduction to Radio Propagation for Fixed and MobileCommunications. Boston, MA: Artech House, 1996.

[24] S. Russel and P. Norvig, Artificial Intelligence: A Modern Approach,2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2003.

[25] R. Parr and S. Russell, “Reinforcement learning with hierarchies ofmachines,” in Proc. NIPS, 1998, pp. 1043–1049.

[26] P. Gupta and T. Javidi, “Towards throughput and delay optimal routingfor wireless ad-hoc networks,” in Proc. Asilomar Conf., Nov. 2007, pp.249–254.

[27] M. J. Neely, “Optimal backpressure routing for wireless networks withmulti-receiver diversity,” in Proc. CISS, Mar. 2006, pp. 18–25.

[28] L. Breiman, Probability. Philadelphia, PA: SIAM, 1992.[29] S. Resnick, A Probability Path. Boston, MA: Birkhuser, 1998.

Abhijeet A. Bhorkar received the B.Tech. andM.Tech. degrees in the electrical engineering fromthe Indian Institute of Technology, Bombay, India,both in 2006, and is currently pursuing the Ph.D.degree in electrical and computer engineering at theUniversity of California, San Diego.His research interests are primarily in the areas of

stochastic control and estimation theory, informationtheory, and their applications in the optimization ofwireless communication systems.

Mohammad Naghshvar (S’10) received the B.S.degree in electrical engineering from Sharif Uni-versity of Technology, Tehran, Iran, in 2007, and iscurrently pursuing the M.S./Ph.D. degrees in elec-trical and computer engineering at the University ofCalifornia, San Diego.His research interests include stochastic con-

trol theory, network optimization, and wirelesscommunication.

Tara Javidi (S’96–M’02) studied electrical en-gineering at the Sharif University of Technology,Tehran, Iran, from 1992 to 1996. She received theM.S. degrees in electrical engineering (systems) andapplied mathematics (stochastics) and Ph.D. degreein electrical engineering and computer science fromthe University of Michigan, Ann Arbor, in 1998,1999, and 2002, respectively.From 2002 to 2004, she was an Assistant Professor

with the Electrical Engineering Department, Univer-sity ofWashington, Seattle. She joined the University

of California, San Diego, in 2005, where she is currently an Associate Professorof electrical and computer engineering. Her research interests are in communi-cation networks, stochastic resource allocation, and wireless communications.Dr. Javidi was a Barbour Scholar during the 1999–2000 academic year and

received an NSF CAREER Award in 2004.

Bhaskar D. Rao (S’80–M’83–SM’91–F’00)received the B.Tech. degree in electronics and elec-trical communication engineering from the IndianInstitute of Technology, Kharagpur, India, in 1979,and the M.S. and Ph.D. degrees from the Universityof Southern California, Los Angeles, in 1981 and1983, respectively.Since 1983, he has been with the University

of California, San Diego, where he is currently aProfessor with the Department of Electrical andComputer Engineering. His interests are in the

areas of digital signal processing, estimation theory, and optimization theory,with applications to digital communications, speech signal processing, andhuman–computer interactions.Dr. Rao has been a Member of the Statistical Signal and Array Processing

Technical Committee of the IEEE Signal Processing Society. He is currently aMember of the Signal Processing Theory and Methods Technical Committee.

http://ieeexploreprojects.blogspot.com