non cooperative power control for wireless ad-hoc networks with repeated games, ieee-2007

12
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007 1101 Non-Cooperative Power Control for Wireless Ad Hoc Networks with Repeated Games Chengnian Long, Member, IEEE, Qian Zhang, Senior Member, IEEE, Bo Li, Senior Member, IEEE, Huilong Yang, and Xinping Guan, Senior Member, IEEE Abstract— One of the distinctive features in a wireless ad hoc network is lack of any central controller or single point of authority, in which each node/link then makes its own decisions independently. Therefore, fully cooperative behaviors, such as cooperation for increasing system capacity, mitigating interfer- ence for each other, or honestly revealing private information, might not be directly applied. It has been shown that power control is an efficient approach to achieve quality of service (QoS) requirement in ad hoc networks. However, the existing work has largely relied on cooperation among different nodes/links or a pricing mechanism that often needs a third-party involvement. In this paper, we aim to design a non-cooperative power control algorithm without pricing mechanism for ad hoc networks. We view the interaction among the users’ decision for power level as a repeated game. With the theory of stochastic fictitious play (SFP), we propose a reinforcement learning algorithm to schedule each user’s power level. There are three distinctive features in our proposed scheme. First, the user’s decision at each stage is self-incentive with myopic best response correspondence. Second, the dynamics arising from our proposed algorithm eventually converges to pure Nash Equilibrium (NE). Third, our scheme does not need any information exchange or to observe the opponents’ private information. Therefore, this proposed algorithm can safely run in a fully selfish environment without any additional pricing and secure mechanism. Simulation study demonstrates the effectiveness of our proposed scheme. Index Terms— Wireless Ad Hoc Networks, Power Control, Non-cooperative, Repeated Games, Stochastic Fictitious Play, Reinforcement Learning, Nash Equilibrium. I. I NTRODUCTION T HE POWER control in a wireless ad hoc network deals with the selection of proper transmission power for each packet at each link in a distributed fashion. This is a complex and intriguing problem since the selection of the power level fundamentally affects many aspects of the operation of the network and its resulting performance; for instance the quality Manuscript received July 1, 2006; revised February 15, 2007. The re- search was support in part by grants from RGC under the contracts HKUST6165/05E and HKUST6164/06E, by grants from HKUST under the contract NRC06/07.EG01, by grant from National Basic Research Program of China (973 Program) under the contract No. 2006CB303000, by grants from NSF China under the contracts 60629203, 60429202, 60573115, 60525303, 60404022, and 60604012, NSF of Hebei Province under the contracts F2006000270, and F2005000390. C. N. Long, H. L. Yang, and X. P. Guan are with the Centre for Networking Control and Bioinformatics (CNCB), Department of Electrical Engineering, Yanshan University, Qinhuangdao, 066004, Hebei Province, P.R. China (e- mail:{dylcn,xpguan}@ysu.edu.cn). Q. Zhang, and B. Li are with Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China (e-mail:{qianzh, bli}@cse.ust.hk). Digital Object Identifier 10.1109/JSAC.2007.070805. of the signal received at the receiver, the interference it creates for the other receivers and energy consumption at each node. In traditional emergency or military situations, the nodes in an ad hoc network usually belong to the same authority with certain common objectives. To maximize the overall system performance, nodes usually work in a fully cooperative way and can unconditionally follow the designated algorithm with a system optimization method [20], [37], [40]. Recently, emerging applications of ad hoc networks are envisioned in civilian and commercial usage, where nodes typically do not belong to a single authority and may not pursue a common goal. The usual assumption of spontaneous willingness to cooperate is unrealistic for autonomous users, especially when there are conflicting interests among the users [3], [28], [32]. The autonomous nature of this evidently affects many aspects in the design of ad hoc networks including the selection power level, which is the focus of study in this paper. In the power control scheme of ad hoc networks, there are usually two fundamentally conflicting objects. On the one hand, the higher the Signal-to-Interference plus Noise Ratio (SINR), the better the service a receiving node can expect; on the other hand, this higher SINR is achieved at the expense of increased drainage on the battery consumption and higher interference to the signals of other users. If each user demonstrates totally selfish behavior, the network performance may significantly degrade, even leading to interrupted service [9], which is called tragedy of the commons in economics. Therefore, to deploy an ad hoc network successfully in a self-organized manner, the issue of cooperation stimulation should be solved. Most of the existing works rely on a pricing mechanism as an incentive to a certain degree to facilitate cooperation among self-interested users in power control for cellular systems [2], [3], [18], [25], [30], [38], and ad hoc networks [19], [22]. However, such a pricing mechanism is often difficult to configure and implement in ad hoc networks, since there lacks a central controller or/and there is no single creditable node in wireless ad hoc networks. There have been fine attempts using cryptographic techniques [39] or mech- anism design [31] to implement the distributed and reliable price-based incentive scheme, yet these can incur significant overheads in the overall protocol design. For example, to bring about incentive-, communication-, and algorithm compatibility in the mechanism design, these need to be a combination of payments, redundancy, problem partitioning, and crypto- graphic techniques [31]. Each user may perform redundant computation as a checker, which creates the opportunity for a 0733-8716/07/$25.00 c 2007 IEEE

Upload: sahathermal6633

Post on 15-Sep-2015

216 views

Category:

Documents


0 download

DESCRIPTION

Non Cooperative Power Control for Wireless Ad-hoc Networks With Repeated Games

TRANSCRIPT

  • IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007 1101

    Non-Cooperative Power Control for Wireless AdHoc Networks with Repeated Games

    Chengnian Long, Member, IEEE, Qian Zhang, Senior Member, IEEE, Bo Li, Senior Member, IEEE, HuilongYang, and Xinping Guan, Senior Member, IEEE

    Abstract One of the distinctive features in a wireless adhoc network is lack of any central controller or single point ofauthority, in which each node/link then makes its own decisionsindependently. Therefore, fully cooperative behaviors, such ascooperation for increasing system capacity, mitigating interfer-ence for each other, or honestly revealing private information,might not be directly applied. It has been shown that powercontrol is an efficient approach to achieve quality of service (QoS)requirement in ad hoc networks. However, the existing work haslargely relied on cooperation among different nodes/links or apricing mechanism that often needs a third-party involvement.In this paper, we aim to design a non-cooperative power controlalgorithm without pricing mechanism for ad hoc networks. Weview the interaction among the users decision for power levelas a repeated game. With the theory of stochastic fictitiousplay (SFP), we propose a reinforcement learning algorithm toschedule each users power level. There are three distinctivefeatures in our proposed scheme. First, the users decision at eachstage is self-incentive with myopic best response correspondence.Second, the dynamics arising from our proposed algorithmeventually converges to pure Nash Equilibrium (NE). Third, ourscheme does not need any information exchange or to observethe opponents private information. Therefore, this proposedalgorithm can safely run in a fully selfish environment withoutany additional pricing and secure mechanism. Simulation studydemonstrates the effectiveness of our proposed scheme.

    Index Terms Wireless Ad Hoc Networks, Power Control,Non-cooperative, Repeated Games, Stochastic Fictitious Play,Reinforcement Learning, Nash Equilibrium.

    I. INTRODUCTION

    THE POWER control in a wireless ad hoc network dealswith the selection of proper transmission power for eachpacket at each link in a distributed fashion. This is a complexand intriguing problem since the selection of the power levelfundamentally affects many aspects of the operation of thenetwork and its resulting performance; for instance the quality

    Manuscript received July 1, 2006; revised February 15, 2007. The re-search was support in part by grants from RGC under the contractsHKUST6165/05E and HKUST6164/06E, by grants from HKUST under thecontract NRC06/07.EG01, by grant from National Basic Research Program ofChina (973 Program) under the contract No. 2006CB303000, by grants fromNSF China under the contracts 60629203, 60429202, 60573115, 60525303,60404022, and 60604012, NSF of Hebei Province under the contractsF2006000270, and F2005000390.

    C. N. Long, H. L. Yang, and X. P. Guan are with the Centre for NetworkingControl and Bioinformatics (CNCB), Department of Electrical Engineering,Yanshan University, Qinhuangdao, 066004, Hebei Province, P.R. China (e-mail:{dylcn,xpguan}@ysu.edu.cn).

    Q. Zhang, and B. Li are with Department of Computer Science andEngineering, Hong Kong University of Science and Technology, Clear WaterBay, Kowloon, Hong Kong, China (e-mail:{qianzh, bli}@cse.ust.hk).

    Digital Object Identifier 10.1109/JSAC.2007.070805.

    of the signal received at the receiver, the interference it createsfor the other receivers and energy consumption at each node.In traditional emergency or military situations, the nodes inan ad hoc network usually belong to the same authoritywith certain common objectives. To maximize the overallsystem performance, nodes usually work in a fully cooperativeway and can unconditionally follow the designated algorithmwith a system optimization method [20], [37], [40]. Recently,emerging applications of ad hoc networks are envisioned incivilian and commercial usage, where nodes typically do notbelong to a single authority and may not pursue a commongoal. The usual assumption of spontaneous willingness tocooperate is unrealistic for autonomous users, especially whenthere are conflicting interests among the users [3], [28], [32].The autonomous nature of this evidently affects many aspectsin the design of ad hoc networks including the selection powerlevel, which is the focus of study in this paper.

    In the power control scheme of ad hoc networks, thereare usually two fundamentally conflicting objects. On theone hand, the higher the Signal-to-Interference plus NoiseRatio (SINR), the better the service a receiving node canexpect; on the other hand, this higher SINR is achieved at theexpense of increased drainage on the battery consumption andhigher interference to the signals of other users. If each userdemonstrates totally selfish behavior, the network performancemay significantly degrade, even leading to interrupted service[9], which is called tragedy of the commons in economics.Therefore, to deploy an ad hoc network successfully in aself-organized manner, the issue of cooperation stimulationshould be solved. Most of the existing works rely on a pricingmechanism as an incentive to a certain degree to facilitatecooperation among self-interested users in power control forcellular systems [2], [3], [18], [25], [30], [38], and ad hocnetworks [19], [22]. However, such a pricing mechanism isoften difficult to configure and implement in ad hoc networks,since there lacks a central controller or/and there is no singlecreditable node in wireless ad hoc networks. There have beenfine attempts using cryptographic techniques [39] or mech-anism design [31] to implement the distributed and reliableprice-based incentive scheme, yet these can incur significantoverheads in the overall protocol design. For example, to bringabout incentive-, communication-, and algorithm compatibilityin the mechanism design, these need to be a combinationof payments, redundancy, problem partitioning, and crypto-graphic techniques [31]. Each user may perform redundantcomputation as a checker, which creates the opportunity for a

    0733-8716/07/$25.00 c 2007 IEEE

  • 1102 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    catch-and-punish scheme that provides incentives for rationalnodes to be faithful. It should be noted that there still needsto be a bank to detect manipulation and discuss appropriatereactions to detected manipulations.

    Our work is motivated by the demand of a new powercontrol with low complexity for wireless ad hoc networks,in which there are no creditable agent(s) and no infrastructuresupport. In this paper, we formulate the power control in adhoc networks with non-cooperative games. To avoid the com-plexity incurred by a pricing mechanism, the power controldecision of each user requires a rational self-incentive property.The self-incentive property means that the designated dynam-ics should largely comply with each users self-interests, thatis, the user runs spontaneously with the given update rule.Specifically, we answer the following two questions: Doesthere exist a control rule with a self-incentive property for non-cooperative power control game? How to design a rationalself-incentive dynamics to approximate the Nash Equilibrium(NE) of the non-cooperative power control game?

    To formulate the non-cooperative power control analytically,we first model the self-interest property for power control inad hoc networks. In general, the concept of utility, a termin microeconomics, is commonly used [10], [30]. The utilityrefers to the level of satisfaction the decision-taker receivesas a result of its actions. In this paper, we construct a utilitymodel with the consideration of both the QoS requirement andthe energy-efficiency property. Based on the utility model, wethen formulate the power control problem with the theory ofnon-cooperative games.

    Power control in wireless ad hoc networks is inherently arepeated process. It is natural to model the interaction amongusers with repeated games [14]. Our argument is that everyrational user has the motivation to improve its performanceduring the interaction even if they are selfish by nature. Basedon this observation, we can set up the rational dynamics in thestage game. With this dynamics, each user will be endowedwith sufficient intelligent behavior to make a correct decision.Consequently, we introduce the learning theory in games tocomplete the stage game design [15]. Of the many existinglearning models in games, one of the best known is fictitiousplay (FP). In FP and its variant, each user chooses the bestresponse to its beliefs about its opponents, which is givenby the time average of past play. Interestingly, this myopiclearning process is consistent with the desirable self-incentiveproperty in non-cooperative power control design. To copewith observing the opponents private information problem,we propose a utility based reinforcement learning algorithmwith SFP framework, which makes a myopically optimaldecision (MOD) for the next stage under current observation.Here, MOD means that users update their strategies each usingthe best response to the strategies of the others. This learningprocess only needs to observe its own action and the derivedutility, which avoids the complex security and mechanismimplementations incurred by information exchange or obser-vation.

    The challenging problem in this scheme is how to guaranteethe convergence of the learning dynamics in the repeatedgame. To this end, we first show that the non-cooperativepower control with utility function is a supermodular game.

    The supermodularity reflects the interdependence among theusers actions. Furthermore, it provides a strong evidence thatthe user has the incentive to learn the environment for im-proving its own utility. According to the continuous dynamicsarising from SFP, we show that the behavior of MOD canalmost surely converge to a bounded set of NE. To avoidobserving the opponents private information completely, weintroduce a utility-based reinforcement learning algorithm. Byexploiting the utility structure, we can significantly speed upthe convergence. It is an inherently two-time-scale dynam-ics in the proposed learning algorithm. With the stochasticapproximation theory, it can be proved that the dynamics ofthe utility-based reinforcement learning algorithm is almostsurely an asymptotic psedudotrajectory of the flow definedby the SFP dynamics. Furthermore, it can be shown that,in the supermodular games, the mixed NE of SFP learningdynamics is unstable. We can therefore present the importantresult that the dynamics of our utility based SFP learningalgorithm almost surely converges to a pure Nash Equilibrium(PNE). It is a very interesting property for the non-cooperativepower control problem of wireless ad hoc networks.

    Our work is the first step toward exploring the learningtheory in repeated games to design non-cooperative powercontrol, which can eliminate the performance degradationincurred by selfish behavior. We provide the following twoimportant insights: Firstly, for non-cooperative power control with required

    QoS in wireless ad hoc networks, we show that the self-incentive dynamics using the MOD exist in the learningprocess.

    Secondly, the self-incentive learning dynamics can con-verge to the set of PNE. The simulation results show thatthe required QoS performance can always be guaranteedwith a feasible assumption. For the energy-efficiencyutility, the average performance loss in the steady state isabout 13% 20% compared with the system optimum.

    The remaining sections of the paper are organized asfollows. In Section II we introduce the preliminary knowledgeon repeated games, learning theory and supermodular games.In Section III we formulate the non-cooperative power controlproblem formally with the non-cooperative power controlgame. We also present the design objective and challenges.In Section IV we propose the utility based stimulus-responselearning algorithm within the SFP framework. To get a highconvergence rate, we further propose an improved algorithmby exploiting the utility structure. In Section V the conver-gence of the learning dynamics is investigated. The simulationstudy is included in Section VI to show the validity andefficiency of our proposed algorithm. In Section VII, wesummarize some related work on power control and non-cooperative behavior in wireless networks. Finally, we presentin Section VIII an overview of this paper.

    II. NOTATION AND PRELIMINARIESA. Notation

    To make readers easy to follow, we begin by brieflysummarizing notation and definitions in the paper, which isshown in Table I.

  • LONG et al.: NON-COOPERATIVE POWER CONTROL FOR WIRELESS AD HOC NETWORKS WITH REPEATED GAMES 1103

    TABLE ITHE SUMMARIZATION OF THE NOTATION

    Notation DefinitionN the number of active links (users) at a certain

    time slot.N the set of all users {1, . . . , N}. So |N | =

    N .Ai the finite action set of user i, i N .i the mixed strategies for user i, i N .Si the set of mixed strategies for user i, i N .Ui the utility function of user i, i N .pi the power level of user i, i N .i the SINR of user i, i N .G the non-cooperative power control (NPC)

    game [N , {Pi} , {Ui ()}].BRi () smooth best response function.

    i the perturbed best response dynamic i =BRi (i) i

    F (x) game differential equation dxdt

    = F (x) the set of fixed points {x X : F (x) = 0}

    of game differential equationL (x) the limit set of state x :

    {z X : limttx = z}t (Q) invariant set Q n for F

    B. PreliminariesThe process of fictitious play (FP) is one of the most widely

    used learning models [14], [15]. In this process, users behaveas if they think they are facing a stationary, but unknown,distribution of opponents strategies. At each stage, FP makesMOD based on its observed environment. Therefore, it is asuitable learning method among self-interested users. In thisSection, we introduce the basic concept of FP and its variant,stochastic fictitious play (SFP), in the repeated games.

    Supermodular games are those characterized by strategiccomplementarities basically, this means that when oneplayer takes a higher action, the others want to do the same.In power control, the conflict of the QoS requirement fromusers has the property of strategic complementarities, thatis, once one user increases its own power level for improvingthe QoS, the other users may increase those power levels toguarantee their own QoS requirements. Supermodular gamesare particularly interesting because they tend to be analyticallyappealing they have nice comparative statics properties andbehave well under various learning rules [29].

    1) Learning Theory in Repeated Games: A repeated gameis the repeated play of a particular stage game. Non-cooperative power control is inherently a repeated process.Each user collects local information about the network en-vironment and the decision about power level is then madelocally. New information (perhaps dependent on actions inthe prior periods) are emergent, the decision process repeatsitself. In this paper, we model the users interaction in powercontrol with a repeated game.

    a) Fictitious Play: In this subsection, we define theprocess of fictitious play (FP). We first introduce somenotations to describe normal form games. A N player1normal form game G is defined by a collection of finiteaction sets A1, . . . ,AN and a collection of utility functionsU1, . . . ,UN . We assume that player i has a fixed finite setAi = {1, . . . , |Ai|} of pure strategies, called player is action

    1In this paper, we use the terms user, player and link interchangeably.

    space. Player is utility function Ui is a map from the setof action profiles of A = jAj to the real line. Finally,Ai = j =iAj denotes the set of action profiles of playeris opponents.

    The set Si of the mixed strategies for player i is the unitsimplex ni1 ni of the dimension ni 1

    Si ={

    i ni+ :kAi

    ki = 1 and ki 0 for all k Ai}

    .

    We identify Si with the set of probability measures on Ai. Ifai Ai, ai Si denotes the corresponding vertex (i.e. the lthcomponent of ai is 1 for ai = l and 0 otherwise). The gamesstate space is the following compact convex polyhedron

    = S1 . . . SN n1+ . . . nN+ .The payoffs to player i are determined by its utility functionUi : Ai Ai .

    The model of FP supposes that players choose their actionsin each period to maximize that periods expected payoff giventhe opponents empirical frequencies of actions. One can thinkof the FP as a mechanism employed by the users to select theiractions. At each time step, t = 0, 1, . . ., users propose actionvector

    a (t) := (a1 (t) , . . . , aN (t)) ,

    where ai (t) Ai is the label of the action proposed by user i.The objective is to construct a mechanism so that the proposedactions, a (t), ultimately converge at a large enough t. In theFP, the scheduled action at step t is a function of past proposedactions over the interval [0, t 1] as follows. First, enumeratethe actions available to user i as Ai. For any j [1, |Ai|], letji (t) denote the total number of times the user i proposedaction Aji up to stage t. We define the empirical frequencyvector, i (t) |Ai|, of the user i as follows:

    i (t) =

    (1i (t 1)

    t 1 . . .|Ai|i (t 1)

    t 1

    ).

    Now we define the FP process. At stage game t, user iselects action ai (t) Ai in accordance with maximizingits expected utility as though all the other users make anindependent random selection of their actions, j =ij (t), i.e.ai (t) argmaxAi Eai [Ui (,ai)]. Noticing that thisis an inclusion, since the best response correspondence forthese beliefs is not necessarily single-valued. The discontinuityinherent in determinate FP is troubling descriptively and canlead to poor long-term performance.

    b) Stochastic Fictitious Play: Since best responses aregenerically pure, a players choices under determinate FPare often sensitive to the exact value of its beliefs. Smallchanges in beliefs can lead to discrete changes in behavior.Even when beliefs converge to NE, actual behavior may notbe the case. In particular, the behavior can never convergeto the mixed equilibrium of a game. To contend with theseissues, Fudenberg and Levine [15] introduced SFP. In thismodel, each player i chooses a strategy i to maximize theperturbed utility

    Ui (i,i) + i (i) , (1)

  • 1104 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    where > 0 is a temperature parameter that controls thelevel of randomization, i : Si is a player-dependentsmoothing function, which is a smooth, strictly differentiableconcave function such that as i approaches the boundary ofSi, the slope of i becomes infinite. The conditions on imean that, for the fixed i, there is a unique maximizingi, so we can define the smooth best response function

    BRi (i) = argmaxi

    {Ui (i,i) + i (i)} . (2)

    A typical example of perturbation functions that satisfy theseconditions is the entropy function H () :H (i) = Ti log (i) =

    aiAi

    i (ai) log i (ai) . (3)

    For any > 0, we can explicitly solve for BRi as

    BRi (i) [ai] = exp ((1/)Ui (ai,i))riAi exp ((1/)Ui (ri,i))

    . (4)

    We are interested in small values of > 0 because forsuch values, BRi (i) approximately maximizes user isunperturbed utility.

    2) Supermodular Games: S-modular games are developedin [34] for games where the strategy space Si of playeri is a compact sublattice of m. By sublattice, we meanthat it has the property that for any two elements a and bwhich are contained in Si, also the component-wise minimummin (a, b) (denoted by a b) and component-wise maximummax (a, b) (denoted by ab) are contained there. In particular,a compact sublattice has component-wise smallest and largestelements. In the following, we present the main results forthe case of m = 1. Consider N players, and the utilityof player i corresponding to the N -dimensional vector ofstrategies x is denoted by Ui (x). Let denote the space ofall strategies. We present the property of increasing differences(supermodularity) with the following definition [4].

    Definition 1: The utility function Ui for player i is super-modular if and only if for all x, y

    Ui (x y) + Ui (x y) Ui (x) + Ui (y) .Consider the case that Ui is a twice differentiable function. Itis supermodular in x = (x1, . . . , xN ) if it has the increasingdifferences property, i.e., 2U(x)xixj 0 for all x and j =i. For one-dimension discrete strategy space, the increasingdifferences property can be stated that the difference Ui (a)Ui (a) is (strictly) increasing in aj , i.e.Ui (ai, ai) Ui (ai, ai) Ui

    (ai, a

    i) Ui (ai, ai) (5)

    for ai ai and ai ai.Remark 1: It should be noted that the condition (5) for

    increasing differences can be extended to the mixed strategyi.

    Proposition 1: A game G = [N , {Pi} , {Ui ()}] is super-modular if for each player i N ,(i) the strategy Pi is a nonempty and compact sublattic,(ii) the utility function Ui is continuous in all playersstrategies, is supermodular in player is own strategy, and hasincreasing differences between any component of player isstrategy and any component of any other players strategy.

    III. NON-COOPERATIVE POWER CONTROL GAMEIn this paper, we propose a non-cooperative power control

    with self-incentive dynamic property in self-interested ad hocnetworks, which does not need a pricing mechanism and canavoid implementational complexity. We emphasize that self-incentive dynamic property, where the user is willing to obeythe designated dynamics with a myopic view of its selfishinterest. A key component for describing the selfish interestis utility function. In this section, we present a utility modelfor power control, which considers both the QoS requirementand the energy-efficiency property. With the utility model,we formalize the power control problem with non-cooperativegames. Finally, we state the design objective and highlight thechallenging problems.

    A. Utility ModelThe goal in power control in wireless networks is to ensure

    that no users SINR i falls below its required threshold ichosen to ensure adequate QoS, i.e., to maintain

    i i , i (6)where the subscript i is the set of users.

    Considering a wireless ad hoc network with a set N ={1, . . . , N} of distinct node pairs. Each link consists of onededicated transmitter and one dedicated receiver. We designatethe transmitted power and SINR for the ith user by pi (pmini pi pmaxi ) and i, respectively. The users transmitpower vector is denoted by p = (p1, . . . , pN ). We denote thebackground (receiver) noise within the users bandwidth by i.In the deterministic formulation of the power control problemfor wireless networks, the noise power i is dealt with aconstant. We use a snapshot model with the assumption thatlink gains evolute slowly with respect to the SINR evolution.In this problem formulation, the SINR of the ith user is

    i (p) =pihii

    i +

    j =i pjhji, (7)

    where hii is the link gain from the ith users transmitter Ti toits intended receiver Ri, and hji is the link gain from the jthusers transmitter Tj to the ith users receiver Ri. We denotethe interference plus noise of user i by

    Ii(pi

    )= i +

    j =i

    pjhji. (8)

    To pose the power control problem as a non-cooperativegame, we need to define a utility function suitable for dataapplications. Most data applications are sensitive to error buttolerant to delay. It is clear that a higher SINR level at theoutput of the receiver will generally result in a lower biterror rate and hence higher throughput. However, achievinga high SINR level requires the user terminal to transmitat a high power, which in turn results in low battery life.Moreover, a user with high power will increase the magnitudeof the interference it creates for other receivers. This trade-off can be quantified by defining the utility function of theaverage amount of data received correctly per unit of energyconsumption

    ui (p) =Ti (p)pi

    , (9)

  • LONG et al.: NON-COOPERATIVE POWER CONTROL FOR WIRELESS AD HOC NETWORKS WITH REPEATED GAMES 1105

    where Ti (p) is the effective throughput (goodput). We assumethat transmitters use variable-rate M-QAM, with a boundedprobability of symbol error and trellis coding with a nominalcoding gain. Thus, the effective throughput Ti (p) can be wellapproximated [35] by

    Ti (p) = W log2

    (1 +

    i (p)

    ), (10)

    which is a function of global power vector p and channelconditions. Here, W is the bandwidth of the channel, and 1 is the gap between uncoded M-QAM and the capacity,minus the coding gain.

    Considering both the QoS requirement (6) and energyefficient utility (9), we express the utility of user i formallyas follows

    Ui(pi,pi

    )=

    {W log2(1+i(p)/)

    piif i (p) i ,

    0 otherwise.(11)

    B. Non-cooperative Power Control GameIn this subsection, we formulate the users selfish be-

    havior with a non-cooperative game framework. Let G =[N , {Pi} , {Ui ()}] denote the non-cooperative power control(NPC) game where N = {1, . . . , N} is the index set for activeusers currently in an ad hoc network, Pi is the strategy set,and Ui () is the utility function of user i. Each users selectsa power level pi such that pi Pi. Let the power vectorp = (p1, . . . , pN) P denote the outcome of the game interms of the selected power levels of all the users, whereP is the set of all power vectors. The utility function (11)demonstrates the strategic interdependence among users. Thelevel of utility each user gets depends on its own power leveland also on the choice of other players strategies, throughthe SINR of that user. We assume that each users strategyis rational, that is, each user maximizes its own utility in adistributed fashion. Formally, the NPC game G is expressedas

    maxpiPi

    Ui (pi, pi) , for all i N , (12)

    where Ui is given in (11) and Pi =[pmini , p

    maxi

    ]is the strategy

    space of user i. In this game p is the strategy profile, and thestrategy profile of user is opponents is defined to be pi =(p1, . . . , pi1,pi+1, . . . , pN ), so that p = (pi, pi). A similarnotation will be used for other quantities.

    User is best response is BRi (pi) =argmaxpiPi Ui (pi, pi), i.e., the pi that maximizesUi (pi, pi) given a fixed pi. With the best responseconcept, we can present the following definition for the NashEquilibrium (NE) of NPC game G [16].

    Definition 2: A strategy profile p is a Nash Equilibrium(NE) of NPC game G if it is a fixed point of the best response,i.e. Ui

    (pi , p

    i) Ui (pi, pi) for any pi Pi and any user

    i.The NE concept offers a predictable, stable outcome of a

    game where multiple agents with conflicting interests competethrough self-optimization and reach a point where no playerwishes to deviate. However, such a point does not necessarilyexist. In general, we have the following result of the sufficient

    condition for the existence of an NE in a non-cooperative game[30].

    Proposition 2: An NE exists in NPC game G =[N , {Pi} , {Ui ()}] if, for all i = 1, . . . , N :(i) The strategy profile Pi is a nonempty, convex, and compactsubset of some Euclidean space n.(ii) Ui () is continuous in p and quasi-concave in pi.

    To be compatible with the SFP framework, we discretizethe continuous power profile pi Pi =

    [pmini , p

    maxi

    ]as the

    following

    pi (ai) =(1 ai

    Mi

    )pmini +

    aiMi

    pmaxi , ai = 0, . . . ,Mi. (13)

    Then the action of user i is denoted by an integer

    ai Ai = {0, . . . ,Mi} . (14)The NPC game G = [N , {Pi} , {Ui ()}] can be convertedto the discrete form Gd = [N, {Ai} , {Ui}], i.e., each userchooses its strategy to maximize its own utility

    maxiSi

    Ti Ui (ai,i) , i. (15)

    The concept of NE in discrete NPC game Gd is similar to thatof NPC game G.

    C. Design Objective and Challenging ProblemsIn this subsection, we present the design objective and chal-

    lenging problems in NPC game. First, we give a preconditionof our design. We define the feasible power vector set

    pfeasible= {p P , i (p) i , i} . (16)As in [20], we give the following assumption:

    Assumption: Given the routing and MAC protocol, Nconcurrent users can transmit data successfully and satisfythe QoS constraint (6), i.e., there is feasible power vector setpfeasible at a particular schedule.

    In the discrete strategy space Ai with feasible power vectorset pfeasible , we can observe that the utility function (11)does not have the quasi-concave property for pure action aior mixed strategy i, which is different from the utility of theenergy-effiency power control design in [30]. To investigatethe existence of NE in NPC game Gd, we should refer to thegraceful property of supermodular games.

    Our aim is to design non-cooperative power control inwireless ad hoc networks with self-interested users. We applythe repeated games to model the interaction among the userspower decisions. Compared with the related work [13], thefundamental difficulty is that the NE of a repeated gamewith utility (11) is unknown. Therefore, the simple tit-for-tat (TFT) punishment mechanism in repeated games can notbe directly applied in this problem. We, hence, introducethe reinforcement learning method to find each users bestdecision over its interest. The problem is challenging in severalaspects due to the following constraints: Each user can observe its own private information, such

    as the historical and current decision and derived utilities.However, no user can observe its opponents privateinformation. Otherwise, this would result in implementa-tional complexity.

  • 1106 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    Each users power level decision should be self-incentive.Otherwise, the user would not be willing to obey thedesignated dynamics.

    The first constraint shows that SFP cannot solve the powercontrol problem directly because SFP should trace its op-ponents historical decisions. In SFP, each user selects anaction that maximizes its immediate expected payoff underthe assumption that opponents will play a mixed strategy bythe historical frequencies of past plays. The second constraintshows that the dynamic update rule should be myopic bestresponse from the users view for the environment. However,the users view may be wrong in a certain time period.Therefore, the challenging problems are: how to design anon-cooperative power control with the myopic best responsedynamics over private and incomplete information? How toguarantee the convergence and a satisfactory convergencespeed?

    IV. NON-COOPERATIVE POWER CONTROL WITH UTILITYBASED SFP APPROACH

    In this section, we propose a non-cooperative power controlwith the myopic best response dynamics over private andincomplete information. The main disadvantage of SFP isits requirement to account for the opponents historical jointbehavior distribution. In non-cooperative power control, how-ever, the users only know what payoff they are getting fromtheir statues quo action. To make SFP sense in private informa-tion environment, it is clear that the users need to evaluate theirutilities more directly without using the empirical frequencies.This motivates utility based SFP. Furthermore, we propose animproved algorithm by exploiting the utility structure. Thiswill increase the convergence speed in the learning process.

    A. Utility Based SFPTo avoid observing opponents private information, we

    should propose an algorithm only with its own informationin the dynamics. In stage game of power control, user iselects its action, ai (t), according to a probability distributioni (t 1). User i neither knows the opponents action distri-bution i (t 1) nor the utility Ui (ai (t)) before running itsown action ai (t). But the user i can compute the attainableutility Ui (ai) with the feedback information from receiver. Itshould be noted that the only use of the opponent joint strategyi is in the assessment of the action values Ui (ai,i) inSFP. Accordingly, user i compute an estimate Uaii (t) usingthe following recursion

    Uaii (t) ={ Uaii (t)Uaii (t1)

    aii (t)t

    + Uaii (t 1) if ai (t) = ai,Uaii (t 1) otherwise. (17)

    where Uaii (t) is the utility obtained by user i at time t.The distinction between action observation and utility

    based SFP is that the users predict their utilities duringthe stage game based on the actual utilities correspond-ing to the previous estimated utilities. At any time step,t 1, user i is always assumed to know its own pro-posed actions, ai (1) , . . . , ai (t 1), and the probabilities,

    i (1) , . . . ,i (t 1), with which its own actions were se-lected. The only additional information required for user ito estimate its average utility at the time t 1 is the utilitiesUai(1)i (1) , . . . , Uai(t1)i (t 1). In particular, the users do notneed to compute the empirical frequencies of the past actionsmade by any user and compute their expected utilities basedon the empirical frequencies. This significantly alleviates theimplementation and computational bottleneck of SFP.

    Once user i computes its estimated utility, Ui (t), it proposesan action ai (t + 1) for next time according to a probabilitydistribution, i (t) Si, that maximizes its perturbed utility

    Ti (t) Ui (t) + H (i) , (18)where H () is the entropy function given in Eq. (3).

    B. Utility based SFP with Exploiting the Utility StructureA user does not need to know its own utility structure

    in utility based SFP approach. Therefore the recursion (17)of utility estimate only updates the value of current selectedaction. It may result in slow convergence and cannot fit inthe application for power control. Actually, user knows itsutility structure (11) in power control game. With exploitingutility structure and the measured interference and noise Ii,the user can get its attainable utility of each action, Uji (t) ,j = 0, . . . ,Mi

    Uji (t) ={

    Wlog 2

    log(1+hiipji/bIi(t) )pji

    if hiipji

    bIi(t) i ,

    0 otherwise.(19)

    Herein,

    Ii (t) = hiipai(t)i (t)i (t)

    , (20)

    where ai (t) is the selected action of user i at stage t.Therefore, the average utility estimate of every action can beupdated based on current observation as follows

    Uji (t) = U

    ji (t 1) + U

    ji (t)Uji (t1)

    tji (t1)if j = l,

    Uji (t 1) +(Uji (t)Uji (t1))

    t+ otherwise,(21)where is the filter parameter. Since tji (t) will be approxi-mately equal to the number of times user i selected the actionj until time t, we can explain that the utility estimate Uji (t)as an approximation to user is average utility for action jcorresponding to those past actions where user i has selected.Note that t+ is decreasing with time evolution and

    t +

    { 0.5 if t ,< 0.5 otherwise.

    We can thus explain that is a believable parameter for thehistorical learning process. With this manner, we can increasethe learning speed and make it sense for power control.

    In the following, we can update the probability distributionji (t) with maximizing the perturbed utility (18). To guaranteethe convergence, we propose a weighted filter algorithm asfollows:

    ji (t) =2

    t2 + 2exp

    ((1/) Uji (t)

    )Mi

    k=0 exp((1/) Uki (t)

    ) + t2ji (t 1)t2 + 2

    .

    (22)

  • LONG et al.: NON-COOPERATIVE POWER CONTROL FOR WIRELESS AD HOC NETWORKS WITH REPEATED GAMES 1107

    Using these update rules, the USFP_EUS algorithm is givenin Algorithm 1.

    Remark 2: Given the different time-varying filter parame-ters in (21) and (22), the algorithm inherently a two-time-scaledynamics with average utility estimate dynamics (fast dynam-ics) and probability distribution dynamics (slow dynamics).

    Remark 3: USFP_EUS algorithm is based on the SFPframework. At each stage, the users decision is MOD. There-fore, USFP_EUS algorithm has the self-incentive property.

    Remark 4: Eq. (19) shows that each link only needs tomeasure its own SINR, which does not need to any messagepassing among the links. Eq. (21) shows that the filter algo-rithm only records the latest average utility estimate. There-fore, USFP_EUS algorithm is a fully decentralized algorithmwith small memory unit and light computation overhead.

    Algorithm 1: USFP_EUS Algorithm1. For t = 0, initialization:(i) Draw ai (0) = rand (0,Mi), compute pi (ai (0)) =(

    1 ai(0)Mi)pmini +

    ai(0)Mi

    pmaxi .

    (ii) Measure the SINR i (0) with the feedback informationof the intended receiver, then derive the estimated interferenceand noise Ii (0) = hiipai(0)i (0) /i (0) .

    (iii) Compute the attainable utility Uji (0) with Eq. (19) forall the action j based on the measured Ii (0) .

    (iv) Initialize the average utility Uji (0) = Uji (0) ,and theprobability ji (0) for the action j = 0, . . . ,Mi

    ji (0) =exp

    ((1/) Uji (0)

    )Mi

    k=0 exp((1/) Uji (0)

    ) .2. For time t 1(i) Draw ai (t) = l, l {0, . . . ,Mi} randomly ac-

    cordingly to the probabilities i (t 1), compute pli (t) =(1 lMi

    )pmini +

    lMi

    pmaxi .

    (ii) Measure the SINR i (t) with the feedback informationof the intended receiver, then derive the estimated interferenceand noise Ii (t) with Eq. (20).

    (iii) Compute the attainable utility Uji (t) with Eq. (19) forall the action j based on the measured Ii (t).

    (iv) Compute the average utility estimate with Eq. (21) forj = 0, . . . ,Mi.

    (v) Update ji (t) with Eq. (22) for j = 0, . . . ,Mi.

    V. CONVERGENCE ANALYSISIn this Section, we present a formally analysis on our

    proposed learning algorithm. First, we show that the su-permodularity of the non-cooperative power control game.Second, we show the limit set of continuous time dynamicsarising from SFP in supermodular game converges to theset of NE of the non-cooperative game. Third, we showthat the dynamics of USFP_EUS algorithm is almost surelyan asymptotic psedudotrajectory of the flow defined by thesmooth best response dynamics (SFP dynamics). Finally,with the above results, we can derive our main conclusion:USFP_EUS algorithm is almost surely to converge the set ofpure NE. The detail proof for Theorem 1 and Theorem 3 canbe found in [26].

    A. NPC Supermodular GamesWe first show that NPC game G and Gd are supermodular

    game. It is an important property for our design.Theorem 1: The continuous NPC with the energy efficient

    utility function (9) in continuous strategy space is supermod-ular game. Moreover, in the set of feasible power vectorpfeasible , the NPC with the utility function (11) are super-modular games both in discrete pure strategy space and inmixed strategy space.

    The property of increasing differences shows that super-modular games are games in which each players strategyset is partially ordered, the marginal returns to increasingones strategy rise with increases in the competitors strategies.Supermodular games are of particular interest since theyhave Nash Equilibria. The simplicity of supermodular gamesmakes convexity and differentiability assumptions unnecessary[30]. The following proposition summarizes several importantproperties of supermodular games [20].

    Proposition 3: In a supermodular game G =[N , {Pi} , {Ui ()}].(i) The set of NEs is a nonempty and compact sublattice andso there is a component-wise smallest and largest NE.(ii) If the users best responses are single-valued, and eachuser uses the myopic best response updates starting fromthe smallest (largest) elements of its strategy space, then thestrategies monotonically converge to the smallest (largest)NE.(iii) If each user starts from any feasible strategy and usesmyopic best response updates, the strategies will eventuallylie in the set bounded component-wise by the smallest andlargest NE. If the NE is unique, the myopic best responseupdates globally converge to that NE from any initialstrategies.

    B. SFP Dynamics in Supermodular GamesIn this subsection, we present some results on SFP dy-

    namics in supermodular games with stochastic approximationtheory [6], [7], [21].

    1) Continuous Time Dynamics Arising From SFP: In SFP,the empirical frequency vector i (t) Si of player i afterthe first t 1 games is the vector

    i (t) =1t

    tj=1

    ai (t) .

    The lth component (i (t))l of i (t) is the proportion oftimes in first t games that player i has played action l Ai.We can obtain a recursive definition of i (t)

    i (t + 1) =1

    t + 1(ti (t) + ai (t + 1)) .

    We can then compute the expected increments of i (t)

    E (i (t + 1) i (t) | (t) = )=

    1t + 1

    [E (ai (t + 1) | (t) = ) ]

    =1

    t + 1(BRi (i (t)) i (t)) .

  • 1108 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    Thus, we see that after a reparameterization of time, expectedchanges in the time average (t) are governed by the per-turbed best response dynamic

    i = BRi (i) i. (23)This dynamics is defined on the space of mixed strategyprofiles .

    Our analysis of the SFP process will relay on a closeconnection between the asymptotic behavior of sample pathsof such a stochastic process { (t)} and the the followinggame differential equation

    dx

    dt= F (x) . (24)

    2) Limit Sets and Chain Recurrence: To understand thedynamics (23) of SFP, we introduce the concept of limit setsand chain recurrence.

    Consider a dynamic (24) that generates a semiflow : +X X on the compact set X n. The set of fixed pointsof (24) can be defined as = {x X : F (x) = 0} = {x X : tx = x for all t 0}

    (25)The limit set of state x is the set of limit points of the solutiontrajectory tx starting at x

    L (x) ={z X : lim

    ttx = z}

    An invariant set for F is a set Q n such that t (Q) = Qfor all t. For any invariant set Q we denote by |Q therestriction of the flow to Q. Let Q denote a compactinvariant set. A subset K of Q is called an attractor for|Q provided by K is nonempty, compact and invariant,and there is neighborhood Q of K with the propertythat limtdist(tx,K) = 0 uniformly for x . Heredist(tx,K) means the distance from tx to the nearest pointof K . We call Q attractor-free if Q is nonempty compactinvariant set that contains no proper attractor, which meansthe attractor of Q is either asymptotically stable limit cycleand equilibrium or the whole space Q. With the above notion,we give the limit set theorem from Theorem 3.3 of [7].

    Proposition 4: Consider a dynamic (24). With probabilityone, the state limit set L (x) , x X has the followingproperties:(i) L (x) , x X is an invariant set for the flow of the gamevector field F .(ii) L (x) , x X is compact, connected and attractor-free.

    Another important concept is chain recurrence. Call asequence {x = x0, x1, . . . , xk = y} an -chain from x to yif for each i {1, . . . , k}, there is a ti > 1 such that|tixi1 xi| < . The -chain specifies k + 1 segments ofsolution trajectories to (24). We call the state x chain recurrentif there is an -chain from x to itself for all > 0. We let denote the set of chain recurrent points of (24). If everypoint of x X is chain recurrent then is a chain recurrentsemiflow. If there is -chain for all x, y X , we say that flow is chain transitive. Let X be a nonempty invariant set. is called chain recurrent on if every point q is achain recurrent point for |. A compact invariant set onwhich is chain recurrent (or chain transitive) is called an

    internally chain recurrent (or internally chain transitive) set.The proposition 5.3 in [6] makes precise the relation betweenthe difference notions we have introduced.

    Proposition 5: Let X . The following claims areequivalent.(i) is internally chain-transitive.(ii) is connected and internally chain-recurrent.(iii) is compact invariant set and | admits no properattractor.

    3) Convergence of SFP in Supermodular Games: Thefollowing Theorem states that the SFP dynamics (24) mayconverge to the bounded set of the fixed point almost surely.

    Theorem 2: Consider SFP t starting from arbitrary initialconditions. Suppose that G is a N player supermodular game.Then

    Pr (L{t} or L{t} i [,] for some i) = 1.where i is a Lipschitz submanifold, and every persistent non-convergent trajectory is asymptotic to a trajectory in an i,[,] is the interval of the fixed point set . In particular, if theset of fixed points is {}, then Pr (limt t = ) = 1.

    The proof for the Theorem 2 can be based on the Proposi-tion 4, Proposition 5 in this paper and Corollary 5.5 in [21].The detailed proof procedure is omitted here.

    C. Convergence of USFP_EUS AlgorithmFirst, we present a result on the two-time-scale dynamics

    property of USFP_EUS algorithm.Theorem 3: For the USFP_EUS algorithm, each player i

    Ui (t) = Ui (t 1) + 1i (t)(Ui (t) Ui (t 1)

    ),

    i (t) =2

    t2 + 2BRi

    (Ui (t))+ t2t2 + 2

    i (t 1) ,

    where i (t) =(0i (t) . . .

    Mii (t)

    )ji (t) =

    {1

    tji (t)if j = ai (t) ,

    t+ otherwise,

    we haveUaii (t) Ui (ai,i (t)) 0 as t a.s.and a suitable interpolation of the i (t) processes will almostsurely be an asymptotic psedudotrajectory of the flow definedby the smooth best response dynamics

    i = BRi (i) i.The proof of Theorem 3 is based on Bokans result on

    the asymptotic behavior analysis for two-time-scale coupledstochastic dynamic systems [8] and asymptotic pseudotra-jectory relating the SFP dynamic with that of deterministicdynamic systems [6].

    With the above Theorem 2, Theorem 3, Theorem 1 in [11]and the monotone property for its own action in utility function(11), we can state our main result in Theorem 4.

    Theorem 4: The dynamics arising from the USFP_EUSalgorithm is almost surely to converge the set of pure NE.

  • LONG et al.: NON-COOPERATIVE POWER CONTROL FOR WIRELESS AD HOC NETWORKS WITH REPEATED GAMES 1109

    100

    150

    200

    250

    300

    350

    400

    4 5 6 7 8 9 10 11

    Convergence

    speed

    Numberofusers

    Fig. 1. The relation between the convergence speed and network scale.

    VI. NUMERICAL EXAMPLESWe simulate a network contained in a 300m300m square

    area. There are 100 nodes in the square area in a randomplacing manner. Two nodes can communication directly iftheir distance is no more than 50m. We choose N linksrandomly from the square area.

    Each users utility is Eq. (11). The bandwidth W = 106Hz,the AWGN noise i = 1.0 1010 and = 1. Thetransmitter powers are 50mW pi 100mW. The powergains are given by hij = KSij (d0 /dij ) , where dij isthe distance between the nodes, K and d0 are normalizationconstants set to K = 106 and d0 = 10m, respectively,the path loss exponent = 4, and the shadowing factorSij are random, independent and identically generated froma lognormal distribution with a mean of 0 dB and variance = 8dB (so Sij = 10Nij/10 and Nij is Gaussian withexpectation E [Nij ] = 0 and standard deviation Nij = 8). Inthe USFP_EUS algorithm, = 100,Mi = 49, i = 1, . . . , N .To overcome the overflow of the exponential operation, we set

    i (t) = max

    minj(Uji (t) > 0

    )200

    ,maxj

    (Uji (t)

    )650

    where minj

    (Uji (t) > 0

    )is the minimum utility of all the

    actions with positive utilities at time t.

    A. Convergence SpeedThe major concern for USFP_EUS algorithm is the con-

    vergence speed of the repeated dynamics. In this simulation,we study the relation between convergence speed and networkscale. We run it 20 times for each N users scenario, N =5, . . . , 10. At each simulation, the USFP_EUS algorithm isexecuted with the derived power gains matrix H = [hij ]. Therelation between the convergence speed and network scale isshown in Fig. 1. Each column represents the results for agroup of simulation with the same number of users, and eachdata point is the convergence speed of a single simulation.Fig. 1 shows that the number of iterations is no more than350 in all the simulations and the average convergence speedis about 265. The graph shows that the convergence speed ofthe USFP_EUS algorithm is not affected by the network scale.

    B. Performance LossAnother important concern for the distributed

    algorithm with selfish utility is the performance

    60

    65

    70

    75

    80

    85

    90

    95

    100

    4 5 6 7 8 9 10 11

    Utility

    performance

    ratio

    (%)

    Numberofusers

    Fig. 2. Utility performance ratio at different network scales.

    40

    50

    60

    70

    80

    90

    100

    0 50 100 150 200 250 300 350 400

    Power

    (mW)

    Numberofiterations

    user1user2user3user4user5user6user7user8user9user10

    Fig. 3. Utility dynamics with N = 10.

    loss compared with the systems optimum U =maxpminppmax

    Ni=1 Ui

    (pi,pi

    ).we compute the

    utility performance ratio at each simulation with =

    Ni=1 Ui (j) /U , where Ui (j) is the user is utility at

    steady action j. The utility performance ratios at differentnetwork scales are shown in Fig. 2. This figure illustratesthat the average performance losses are from 13% 20%at different network scales. The worst case is no more than30%.

    Finally, we depict the power dynamics in Fig. 3 and utilitydynamics in Fig. 4 using one of simulation results withN = 10. They validate the convergence of the USFP_EUSalgorithm.

    VII. RELATED WORK

    A. Power Control in Wireless NetworksThe power control in CDMA/TDMA cellular networks

    has been extensively studied in the literature. Most of thework is on designing a power control algorithm to minimizetransmitting power subject to guarantee the SINR of ongoingconnections with the assumption of cooperation among theusers [17], [37]. There is conflicting interest in power control.Hence, it is appropriate to address power control of cellularnetworks within a non-cooperative game-theoretic framework[3], [23], [25], [30], [38]. To give incentive for a certain degreeof cooperation among the users, they, in general, introduced aprice-based mechanism to constrain the totally selfish utility.Obviously, the price mechanism has a significant effect onthe users behavior, NE and system performance. In cellularnetwork systems, the price can be managed by base station[30].

    Compared with the cellular networks, little work has beendone on the selfish behavior of power control in wireless

  • 1110 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    00.10.20.30.40.50.60.70.80.9

    1

    0 50 100 150 200 250 300 350 400

    Average

    utility

    (Mbps/mW)

    Numberofiterations

    user1user2user3user4user5user6user7user8user9user10

    Fig. 4. Power dynamics with N = 10.

    ad hoc networks. In [20], Huang et al. proposed a price-based power control framework for wireless ad hoc networks.Assuming that the users voluntarily cooperate by exchanginginterference information, users announce prices to reflecttheir sensitivities to the current interference levels, and thenadjust their power to maximize their surplus. They introducedfictitious non-cooperative games2 as the convergence prooftechnique, whereas the actual users in the network are assumedto be cooperative, i.e., the price dynamics are forced to obeythe systems optimal conditions. They pointed out that thepower control for non-cooperative users in wireless ad hocnetworks is still an open problem. Our work is motivated bythis problem. However, we do not follow with the pricingmechanism. We propose a new reinforcement learning algo-rithm for non-cooperative power control, where the pricingmechanism is not required. In [22], Ileri et al. studied thenetwork-geometric dependence of incentivized cooperation inwireless ad hoc networks with energy-efficient utility function.They designed a pricing-based joint user-and-network centricincentive mechanism that induces forwarding among selfishusers by compensating the real and opportunity costs of theforwarders. However, the work is inherently to need a centralnode (access node) for computing the price as in wirelesscellular systems.

    In wireless ad hoc networks, the choice of the power levelfundamentally affects the performance of multiple protocollayers. Recently, there has been much work on formulating thepower control problem with cross layer design. The interestedreader is referred to [10] and cited reference therein. However,most work assumes that the users are cooperative. Thus, thecross layer design problem can be converted to the systemsoptimal design. Our future work is to study the effect selfishbehavior has on the cross layer design and non-cooperativepower control with cross-layer design using the proposedreinforcement learning algorithm from this paper.

    B. Non-cooperative Behavior in Wireless NetworksThe problem of non-cooperative nodes/links in wireless

    networks has been widely addressed in the network layer [5],[12], [13], [32], [39], MAC layer [9], [27], and applicationlayer [28], [36], whereas little work has been done on thepower control. Emerging research in game theory is applied to

    2It should be noted that the concept of noncooperative fictitious play in[20] is different from that of the fictitious play in classic learning theory,which is introduced in this paper.

    analyze the non-cooperative behaviors at different layers. Theyshow much promise in helping to understand the complexinteractions between nodes/links in this highly dynamic anddistributed environment [33].

    A one-shot game among self-interested users may resultin extraordinary low utility for each user [9], [13]. However,much of the users selfish behavior at different layers mayoverlap many times, which can be modeled with repeatedgames. The application of repeated games to model theinteraction among the selfish users is still at the nascent stage[1], [13], [24]. In [24] and [1], they studied the price incentivemechanism with dynamic repeated games for Internet routingand packet forwarding in autonomous mobile ad hoc networks,respectively. Flegyhzi et al. [13] studied the Nash Equilibriaof packet forwarding strategies in wireless ad hoc networkswith the TFT punishment strategy in repeated games. Theyhave proven that cooperation solely based on the self-interestof the nodes can in theory exist. However, the conditions ofsuch cooperation are virtually never satisfied in practice.

    In this paper, we consider a multiuser competing wirelessresource viewed as a non-cooperative game, i.e. maximizingthe energy-efficiency performance with the required QoSconstraint of each transceiver, regardless of what all the otherusers do. Compared to the related work, our work has thefollowing unique characteristics. First, our work does notfollow the general pricing mechanism design framework. Wepresent a reinforcement algorithm with self-incentive dynam-ics, where the update rule is MOD based on the users owninformation. Second, our work is the first step in exploring thecombination function between SFP learning in repeated gamesand the strategic complementary properties in self-interestedenvironments for designing non-cooperative power control.We state that the dynamics arising from the learning dynamicscan eventually converge to a steady state with a satisfactoryperformance.

    VIII. CONCLUSIONIn this paper we have developed a non-cooperative power

    control algorithm with repeated games that captures the notionof repetition. Without complex price and secure mechanisms,our proposed USFP_EUS algorithm can be executed safely ina self-interest environment, which is vital for many practicalapplications. We provide the important insight that a felicitousintelligent learning behavior with self-incentive dynamics caneventually converge to steady state with a satisfactory systemperformance. This result may provide an alternative tool todesign a simple protocol for a self-interest environment.

    REFERENCES

    [1] M. Afergan, Using repeated games to design incentive-based routingsystems, in Proc. IEEE INFOCOM, Barcelona, April 2006.

    [2] T. Alpcan , T. Basar, R. Srikant and E. Altman, CDMA uplink powercontrol as a noncooperative game, IEEE Conference on Decision andControl, Orlando, Florida, U.S.A., Dec. 2001

    [3] T. Alpcan, T. Basar, and S. Dey, A power control game based onoutage probabilities for multicell wireless data networks, IEEE Trans.Wireless Commun., vol. 5, no. 4, pp. 890 - 899, April 2006.

    [4] E. Altman, Z. Altman, S-modular games and power control in wirelessnetworks, IEEE Trans. Autom. Control, vol. 48, no. 5, pp. 839842,May 2003.

  • LONG et al.: NON-COOPERATIVE POWER CONTROL FOR WIRELESS AD HOC NETWORKS WITH REPEATED GAMES 1111

    [5] L. Anderegg, S. Eidenbenz, Ad hoc-VCG: a truthful and cost efficientrouting protocol for mobile ad hoc networks with selfish agents, inProceedings of the 9th ACM annual international conference on Mobilecomputing and networking (MobiCom), 2003, pp. 245259.

    [6] M. Benam, Dynamics of stochastic approximation algorithms, InSminaire de Probabilits XXXIII, Lecture Notes in Mathematics 1709,ed. by J. Azma et. al., Berlin: Springer, 1-68, 1999.

    [7] M. Benam, M. W. Hirsch, Mixed equilibria and dynamical systemsarising from fictitious play in perturbed games, Games Econ. Behav.,vol. 29, pp. 3672, 1999.

    [8] V. S. Borkar, Stochastic approximation with two time scales, Syst. &Contr. Lett., vol. 29, pp. 291-294, 1997.

    [9] M. Cagalj, S. Ganeriwal, I. Aad, and J.-P. Hubaux, On selfish behaviorin CSMA/CA networks, in Proc. IEEE INFOCOM, Miami, USA,volume 4, Mar. 2005, pp. 2513-2524.

    [10] M. Chiang, Balancing transport and physical layers in wireless mul-tihop networks: Jointly optimal congestion control and power control,IEEE J. Sel. Areas Commun., vol. 23, no. 1, pp. 104116, Jan. 2005.

    [11] F. Echenique, A. Edlin, Mixed equilibria are unstable in games ofstrategic complements, Journal of Economic Theory, vol. 118, pp. 61-79, 2004.

    [12] S. Eidenbenz, V. S. Anil Kumar, S. Zust, Equilibria in Topology Con-trol Games for Ad Hoc Networks, Mobile Networks and Applications,vol. 11, pp. 143159, 2006.

    [13] M. Flegyhzi, J.-P. Hubaux, and L. Buttyn, Nash equilibria of packetforwarding strategies in wireless ad hoc networks, IEEE Trans. MobileComputing, vol. 5, no.4, pp. 463-476, April 2006.

    [14] D. Fudenberg and D. Kreps, Learning mixed equilibria, Games Econ.Behav., vol. 5, pp. 320367, 1993.

    [15] D. Fudenberg, D. K. Levine, The Theory of Learning in Games.Cambridge, MA: MIT Press, 1998.

    [16] D. Fudenberg, J. Tirole, Game Theory. Cambridge, MA: MIT Press,1991.

    [17] G. J. Foschini, Z. Miljanic, A simple distributed autonomous powercontrol algorithm and its convergence, IEEE Trans. Veh. Technol., vol.42, pp. 641646, Nov. 1993.

    [18] Z. Han and K. J. Ray Liu, Non-Cooperative Power Control Gameand Throughput Game over Wireless Networks, IEEE Trans. Commun.,vol.53, no.10, pp. 1625-1629, Oct. 2005.

    [19] T. Heikkinen, A potential game approach to distributed power controland scheduling, Comput. Networks, vol. 50, no. 13, pp. 2295-2311,Sep. 2006.

    [20] J. Huang, R. Berry and M. L. Honig, Distributed interference com-pensation for wireless networks, IEEE J. Sel. Areas Commun., vol. 24,no. 5, pp. 1074-1084, May 2006.

    [21] J. Hofbauer, W. H. Sandholm, On the global convergence of stochasticfictitious play, Econometrica, vol. 70, no. 6, pp. 2265-2294, Nov. 2002.

    [22] O. Ileri, S.-C. Mau, N. B. Mandayam, Pricing for enabling forwardingin self-configuring ad hoc networks, IEEE J. Sel. Areas Commun., vol.23, no. 1, pp. 151-162, Jan. 2005.

    [23] R. A. Iltis, S.-J. Kim, and D.A. Hoang, Noncooperative iterativeMMSE beamforming algorithms for ad hoc networks, IEEE Trans.Commun., vol. 54, no. 4, pp. 748- 759, 2006.

    [24] Z. Ji, W. Yu, and K. J. R. Liu, An optimal dynamic pricing frameworkfor autonomous mobile ad hoc networks, in Proc. IEEE INFOCOM,Barcelona, April 2006.

    [25] S. Koskie, Z. Gajic, A nash game algorithm for SIR-based powercontrol in 3G wireless CDMA networks, IEEE/ACM Trans. Netw., vol.13, no. 5, pp. 1017-1026, Oct. 2005.

    [26] C. Long, Q. Zhang, B. Li, H. Yang and X. Guan, Non-cooperativepower control for wireless ad hoc networks with repeated games,Technique Report, HKUST, Sept. 2006.

    [27] A. B. MacKenzie, S. B. Wicker, Stability of multipacket slotted Alohawith selfish users and perfect information, in Proceedings of IEEEINFOCOM, San Francisco, CA, Mar.-Apr. 2003, vol. 3, pp. 1583-1590.

    [28] P. Marbach, Y. Qiu, Cooperation in wireless ad hoc networks: Amarket-based approach, IEEE/ACM Trans. Netw., vol. 13, no. 6, pp.13251338, Dec. 2005.

    [29] P. Milgrom, J. Roberts, Rationalizability and learning in games withstrategic complementarities, Econometrica, vol. 58, pp. 1255-1278,1990.

    [30] C. U. Saraydar, N. B. Mandayam and D. J. Goodman, Efficient powercontrol via pricing in wireless data networks, IEEE Trans. Commun.,vol. 50, no. 2, pp. 291-303, Feb. 2002.

    [31] J. Shneidman, D. C. Parkes, Overcoming rational manipulationin mechanism implementations, 2004 [Online]. Available:http://www.gsb.stanford.edu/facseminars/events/economics/pdfs/ccac.pdf

    [32] V. Srinivasan, P. Nuggehalli, C.F. Chiasserini, and R. R. Rao, Coop-eration in wireless ad hoc networks, in Proc. IEEE INFOCOM, SanFrancisco, CA, Mar.-Apr. 2003, vol. 2, pp. 808-817.

    [33] V. Srivastava, J. Neel, A. B. Mackenzie, R. Menon, L. A. Dasilva, J. E.Hicks, J. H. Reed, R. P. Gilles, Using game theory to analyze wirelessad hoc networks, IEEE Communications Surveys & Tutorials, vol. 7,no. 4, pp. 46-56, Fourth Quarter 2005.

    [34] D. M. Topkis, Equilibrium points in nonzero-sum n-person submodulargames, SIAM J. Control Optim., vol. 17, pp. 773778, 1979.

    [35] S. Toumpis, A. J. Goldsmith, Capacity regions for wireless ad hocnetworks, IEEE Trans. Wireless Commun., vol. 2, no. 4, pp 736-748,July 2003.

    [36] W. Wang, B. Li, Market-driven bandwidth allocation in selfish overlaynetworks, in Proc. IEEE INFOCOM, Miami, USA, Mar. 2005, vol. 4,pp. 2578-2589.

    [37] C. Wu, D. P. Bertsekas, Distributed power control algorithms forwireless networks, IEEE Trans. Veh. Technol., vol. 50, no. 2, pp. 504514, Mar. 2001.

    [38] M. Xiao, N. B. Shroff, and E. K. P. Chong, A utility-based powercontrol scheme in wireless cellular systems, IEEE/ACM Trans. Netw.,vol. 11, pp. 210221, Mar 2003.

    [39] S. Zhong, L. Li, Y. G. Liu, and Y. R. Yang, On designing incentive-compatible routing and forwarding protocols in wireless ad-hoc net-works an integrated approach using game theoretical and crypto-graphic techniques, in Proc. Mobicom, Cologne, Germany, Aug.-Sep.,2005, pp. 117-131.

    [40] Y. Wu, P. Chou, Q. Zhang, K. Jian, W. Zhu, and S.Y. Kung, NetworkPlanning in Wireless Ad hoc Networks: A Cross-layer Approach, IEEEJ. Select. Areas Commun., vol. 23, no. 1, Jan. 2005, pp. 136150.

    Chengnian Long (M07) received the BS, MS, andPhD degrees from Yanshan University, China, in1999, 2001, and 2004, respectively, all in controltheory and engineering. He is an associate profes-sor at Yanshan University. He visited Departmentof Computer Science and Engineering, HongkongUniversity of Science and Technology in 2006. Nowhe is a Killam postdoctoral fellow at Departmentof Electrical and Computer Engineering, Universityof Alberta. His current research interests are inthe area of noncooperative behavior and incentive

    mechanism design in wireless multi-hop networks, energy-efficiency protocoldesign in wireless sensor networking, and pricing mechanism in Internet.

  • 1112 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 6, AUGUST 2007

    Qian Zhang (M00-SM04) received the BS, MS,and PhD degrees from Wuhan University, China, in1994, 1996, and 1999, respectively, all in computerscience. She joined the Hong Kong University ofScience and Technology in September 2005 as an as-sociate professor. Before that, she was at MicrosoftResearch Asia, Beijing, China, from July 1999,where she was the research manager of the Wirelessand Networking Group. She has published more than150 refereed papers in international leading journalsand key conferences in the areas of wireless/Internet

    multimedia networking, wireless communications and networking, and over-lay networking. She is the inventor of about 30 pending patents. Her currentresearch interests are in the areas of wireless communications, IP networking,multimedia, P2P overlay, and wireless security. She has also participated manyactivities in the IETF ROHC (Robust Header Compression) WG group forTCP/IP header compression.

    Dr. Zhang is the associate editor for the IEEE Transactions on WirelessCommunications, IEEE Transactions on Vehicular Technologies, IEEE Trans-actions on Multimedia, Computer Networks and Computer Communications.She has also served as guest editor for the IEEE Wireless Communications,IEEE Journal on Selected Areas in Communications, ACM/Springer Journalof Mobile Networks and Applications (MONET), and Computer Networks.Dr. Zhang received TR 100 (MIT Technology Review) worlds top younginnovator award in 2004, the Best Asia Pacific (AP) Young ResearcherAward elected by the IEEE Communication Society in 2004, and the BestPaper Award by the Multimedia Technical Committee (MMTC) of the IEEECommunications Society and Best Paper Award in QShine 2006. She receivedthe Oversea Young Investigator Award from the National Natural ScienceFoundation of China (NSFC) in 2006. Dr. Zhang is Vice-Chair and alsothe Award Committee Chair of the Multimedia Communication TechnicalCommittee of the IEEE Communications Society. She is also a member ofthe Visual Signal Processing and Communication Technical Committee andthe Multimedia System and Application Technical Committee of the IEEECircuits and Systems Society.

    Bo Li (S89-M92-SM99) received his B. Eng.(summa cum laude) and M. Eng. degrees in theComputer Science from Tsinghua University, Bei-jing in 1987 and 1989, respectively, and the Ph.D.degree in the Electrical and Computer Engineeringfrom University of Massachusetts at Amherst in1993. Between 1993 and 1996, he worked on highperformance routers and ATM switches in IBM Net-working System Division, Research Triangle Park,North Carolina. Since 1996, he has been with theDepartment of Computer Science, Hong Kong Uni-

    versity of Science and Technology. Since 1999, he has also held an adjunctresearcher position at the Microsoft Research Asia (MSRA), Beijing, China.His current research interests are on adaptive video multicast, peer-to-peerstreaming, resource management in mobile wireless systems, across layerdesign in multihop wireless networks, content distribution and replication.He has published 90 journal papers and held several patents. He receivedthe Outstanding Young Investigator Award by the National Natural ScienceFoundation of China (NSFC) in 2004. He is currently a Distinguished Lecturerin IEEE Communications Society.

    He has been on editorial board for IEEE Transactions on Wireless Com-munications, IEEE Transactions on Mobile Computing, IEEE Transactions onVehicular Technology, ACM/Kluwer Journal of Wireless Networks (WINET),IEEE Journal of Selected Areas in Communications (JSAC) - Wireless Com-munications Series, ACM Mobile Computing and Communications Review(MC2R), Elsevier Ad Hoc Networks, SPIE/Kluwer Optical Networking Mag-azine (ONM), KICS/IEEE Journal of Communications and Networks (JCN).He served as a guest editor for IEEE Communications Magazine SpecialIssue on Active, Programmable, and Mobile Code Networking (April 2000),ACM Performance Evaluation Review Special Issue on Mobile Computing(December 2000), and SPIE/Kluwer Optical Networks Magazine Special Issuein Wavelength Routed Networks: Architecture, Protocols and Experiments(January/February 2002), IEEE Journal on Selected Areas in CommunicationsSpecial Issue on Protocols for Next Generation Optical WDM Networks(October 2000), Special Issue on Recent Advances in Service-Overlay Net-work (January 2004), and Special Issue on Quality of Service Deliveryin Variable Topology Networks (September 2004), ACM/Kluwer MobileNetworks and Applications (MONET) Special Issue on Energy Constraintsand Lifetime Performance in Wireless Sensor Networks (2nd Quarter of2005), and ACM/Springer Mobile Networks and Applications (MONET)Special Issue on Advances in Wireless Mesh Networks, to appear in 2007. Inaddition, He has been involved in organizing over 40 conferences, esp. IEEEInfocom since 1996. He was the Co-TPC Chair for IEEE Infocom 2004.

    Huilong Yang received the BS, and MS degrees in Control Theory andEngineering from Yanshan University, China, in 2005 and 2007, repectively.His research interesting are on Internet congestion control for high-speednetworks, resource allocation in wireless networks.

    Xinping Guan (M02-SM04) received the B.S. de-gree in mathematics from Harbin Normal University,Harbin, China, and the M.S. degree in applied math-ematics and the Ph.D. degree in electrical engineer-ing, both from Harbin Institute of Technology, in1986, 1991, and 1999, respectively. He is currently aProfessor and Dean of the Institute of Electrical En-gineering, Yanshan University, Qinhuangdao, China.He is the (co)author of more than 100 papers inmathematical, technical journals, and conferences.As (a)an (co)-investigator, he has finished more than

    19 projects supported by National Natural Science Foundation of China(NSFC), the National Education Committee Foundation of China, and otherimportant foundations. His current research interests include wireless sensornetworks, congestion control of networks, robust control and intelligentcontrol for nonlinear systems, chaos control and synchronization.

    Dr. Guan is serving as an Associate Editor of IEEE Transaction on Systems,Man and Cybernetics (Part C), a Reviewer of Mathematic Review of America,a Member of the Council of Chinese Artificial Intelligence Committee, andVice-Chairman of Automation Society of Hebei Province, China. Dr. Guan is"Cheung Kong Scholars Programme" Special appointment professor, Ministryof Education, 2005, and National Science Fund for Distinguished YoungScholars of China, 2005.

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages false /GrayImageDownsampleType /Average /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages false /MonoImageDownsampleType /Average /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError false /PDFXTrimBoxToMediaBoxOffset [ 0.50000 0.50000 0.50000 0.50000 ] /PDFXSetBleedBoxToMediaBox false /PDFXBleedBoxToTrimBoxOffset [ 0.12500 0.12500 0.12500 0.12500 ] /PDFXOutputIntentProfile (None) /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice