stochastic game in energy harvesting wireless communication

14
1 Downlink Power Control in Two-Tier Cellular Networks with Energy-Harvesting Small Cells as Stochastic Games Tran Kien Thuc, Ekram Hossain, and Hina Tabassum Abstract—Energy harvesting in cellular networks is an emerg- ing technique to enhance the sustainability of power-constrained wireless devices. This paper considers the co-channel deploy- ment of a macrocell overlaid with small cells. The small cell base stations (SBSs) harvest energy from environmental sources whereas the macrocell (MBS) uses conventional power supply. Given a stochastic energy arrival process for the SBSs, we derive a power control policy for the downlink transmission of both MBS and SBSs such that they can achieve their objectives (e.g., maintain the signal-to-interference-plus-noise ratio (SINR) at an acceptable level) on a given transmission channel. We consider a centralized energy harvesting mechanism for SBSs, i.e., there is a central energy queue (CEQ) where energy is harvested and then distributed to the SBSs. When the number of SBSs is small, the game between the CEQ and the MBS is modeled as a single-controller stochastic game and the equilibrium policies are obtained as a solution of a quadratic programming problem. However, when the number of SBSs tends to infinity (i.e., a highly dense network), the centralized scheme becomes infeasible, and therefore, we use a mean field stochastic game to obtain a distributed power control policy for each SBS. By solving a system of partial differential equations, we derive the power control policy of SBSs given the knowledge of mean field distribution and the available harvested energy levels in the battery of SBSs. Index Terms—Small cell networks, power control, energy harvesting, stochastic game, mean field game. I. I NTRODUCTION Energy harvesting from environment resources (e.g., through solar panels, wind power, or geo-thermal power) is a potential technique to reduce the energy cost of operating the base stations (BSs) in emerging multi-tier cellular net- works. While this solution may not be practically feasible for macrocell base stations (MBSs) due to their high power con- sumption and stochastic nature of energy harvesting sources, it is appealing for small cell BSs (SBSs) that typically consume less power [2]. Designing efficient power control policies with different objectives (e.g., maximizing system throughput) is among one of the major challenges in energy-harvesting networks. In [3], the authors proposed an offline power control policy for two- hop transmission systems assuming energy arrival information at the nodes. The optimal transmission policy was given by the directional water filling method. In [4], the authors generalized this idea to the case where many sources supply energy to the The authors are with the Department of Electrical and Computer Engineerng at the University of Manitoba, Canada (emails: [email protected], {Ekram.Hossain,Hina.Tabassum}@umanitoba.ca). destinations using a single relay. A water filling algorithm was proposed to minimize the probability of outage. Although the offline power control policies provide an upper bound and heuristic for online algorithms, the knowledge of energy/data arrivals is required which may not be feasible in practice. In [5], the authors proposed a two-state Markov Decision Process (MDP) model for a single energy-harvesting device considering random rate of energy arrival and different priority levels for the data packets. The authors proposed a low-cost balance policy to maximize the system throughput by adapting the energy harvesting state, such that, on average, the harvested and consumed energy remain balanced. Recently, in [6], the outage performance analysis was conducted for a multi-tier cellular network in which all BSs are powered by the harvested energy. A detailed survey on energy harvesting systems can be found in [7] where the authors summarized the current research trends and potential challenges. Compared to the existing literature on energy-harvesting systems, this paper considers the power control problem for downlink transmission in two-tier macrocell-small cell networks considering stochastic nature of the energy arrival process at the SBSs. Note that the power-control policies and their resulting interference levels directly affect the overall system performance. The design of efficient power control policies is thus of paramount importance. In this context, we formulate a discounted stochastic game model in which all SBSs form a coalition to compete with the MBS in order to achieve the target signal-to-interference-plus-noise ratio (SINR) of their users through transmit power control. Using the stochastic game model, we capture the randomness of the system and the Nash equilibrium power control policy is obtained as the solution of a quadratic programming problem. For the case when the number the SBSs is very large, the stochastic game is approximated by a mean field game (MFG). In general, mean field games are designed to study the strategic decision making in very large populations of small interacting individuals. Recently, in [8], the authors modeled a mean field game to determine optimal power control policy for a finite battery powered small-cell network. In this paper, by solving a set of forward and backward partial differential equations, we derive a distributed power control policy for each SBS using MFG model. The contributions of the paper can be summarized as follows. 1) For a macrocell-small cell network, we consider a cen- tralized energy harvesting mechanism for the SBSs in

Upload: tkthuc

Post on 28-Sep-2015

213 views

Category:

Documents


0 download

DESCRIPTION

Stochastic Game, Energy Harvesting, Signal-To-Noise Ratio

TRANSCRIPT

  • 1Downlink Power Control in Two-Tier CellularNetworks with Energy-Harvesting Small Cells as

    Stochastic GamesTran Kien Thuc, Ekram Hossain, and Hina Tabassum

    AbstractEnergy harvesting in cellular networks is an emerg-ing technique to enhance the sustainability of power-constrainedwireless devices. This paper considers the co-channel deploy-ment of a macrocell overlaid with small cells. The small cellbase stations (SBSs) harvest energy from environmental sourceswhereas the macrocell (MBS) uses conventional power supply.Given a stochastic energy arrival process for the SBSs, we derivea power control policy for the downlink transmission of bothMBS and SBSs such that they can achieve their objectives (e.g.,maintain the signal-to-interference-plus-noise ratio (SINR) at anacceptable level) on a given transmission channel. We consider acentralized energy harvesting mechanism for SBSs, i.e., thereis a central energy queue (CEQ) where energy is harvestedand then distributed to the SBSs. When the number of SBSsis small, the game between the CEQ and the MBS is modeled asa single-controller stochastic game and the equilibrium policiesare obtained as a solution of a quadratic programming problem.However, when the number of SBSs tends to infinity (i.e., ahighly dense network), the centralized scheme becomes infeasible,and therefore, we use a mean field stochastic game to obtaina distributed power control policy for each SBS. By solving asystem of partial differential equations, we derive the powercontrol policy of SBSs given the knowledge of mean fielddistribution and the available harvested energy levels in thebattery of SBSs.

    Index TermsSmall cell networks, power control, energyharvesting, stochastic game, mean field game.

    I. INTRODUCTION

    Energy harvesting from environment resources (e.g.,through solar panels, wind power, or geo-thermal power) isa potential technique to reduce the energy cost of operatingthe base stations (BSs) in emerging multi-tier cellular net-works. While this solution may not be practically feasible formacrocell base stations (MBSs) due to their high power con-sumption and stochastic nature of energy harvesting sources, itis appealing for small cell BSs (SBSs) that typically consumeless power [2].

    Designing efficient power control policies with differentobjectives (e.g., maximizing system throughput) is among oneof the major challenges in energy-harvesting networks. In [3],the authors proposed an offline power control policy for two-hop transmission systems assuming energy arrival informationat the nodes. The optimal transmission policy was given by thedirectional water filling method. In [4], the authors generalizedthis idea to the case where many sources supply energy to the

    The authors are with the Department of Electrical and Computer Engineerngat the University of Manitoba, Canada (emails: [email protected],{Ekram.Hossain,Hina.Tabassum}@umanitoba.ca).

    destinations using a single relay. A water filling algorithm wasproposed to minimize the probability of outage. Although theoffline power control policies provide an upper bound andheuristic for online algorithms, the knowledge of energy/dataarrivals is required which may not be feasible in practice.In [5], the authors proposed a two-state Markov DecisionProcess (MDP) model for a single energy-harvesting deviceconsidering random rate of energy arrival and different prioritylevels for the data packets. The authors proposed a low-costbalance policy to maximize the system throughput by adaptingthe energy harvesting state, such that, on average, the harvestedand consumed energy remain balanced. Recently, in [6], theoutage performance analysis was conducted for a multi-tiercellular network in which all BSs are powered by the harvestedenergy. A detailed survey on energy harvesting systems canbe found in [7] where the authors summarized the currentresearch trends and potential challenges.

    Compared to the existing literature on energy-harvestingsystems, this paper considers the power control problemfor downlink transmission in two-tier macrocell-small cellnetworks considering stochastic nature of the energy arrivalprocess at the SBSs. Note that the power-control policies andtheir resulting interference levels directly affect the overallsystem performance. The design of efficient power controlpolicies is thus of paramount importance. In this context, weformulate a discounted stochastic game model in which allSBSs form a coalition to compete with the MBS in orderto achieve the target signal-to-interference-plus-noise ratio(SINR) of their users through transmit power control. Usingthe stochastic game model, we capture the randomness ofthe system and the Nash equilibrium power control policy isobtained as the solution of a quadratic programming problem.For the case when the number the SBSs is very large, thestochastic game is approximated by a mean field game (MFG).In general, mean field games are designed to study the strategicdecision making in very large populations of small interactingindividuals. Recently, in [8], the authors modeled a mean fieldgame to determine optimal power control policy for a finitebattery powered small-cell network. In this paper, by solving aset of forward and backward partial differential equations, wederive a distributed power control policy for each SBS usingMFG model.

    The contributions of the paper can be summarized asfollows.

    1) For a macrocell-small cell network, we consider a cen-tralized energy harvesting mechanism for the SBSs in

  • which energy is harvested and then distributed to theSBSs through a centralized energy queue (CEQ). Notethat the concept of CEQ is somewhat similar to the con-cept of dedicated power beacons that are responsible forwireless energy transfer to users in cellular networks [9],[10]. Moreover, in the cloud-RAN architecture [11],where along with data processing resources, a central-ized cloud can also act as an energy farm that distributesenergy to the remote radio heads each of which acts asan SBS. Subsequently, we formulate the power controlproblem for the MBS and SBSs as a discrete single-controller stochastic game with two players.

    2) The existence of the Nash equilibrium and pure station-ary strategies for this single-controller stochastic gameis proven. The power control policy is derived as the so-lution of a quadratic-constrained quadratic programmingproblem.

    3) When the network becomes very dense, a stochasticMFG model is used to obtain the power control policyas a solution of the forward and backward differentialequations.

    4) An algorithm using finite difference method is proposedto solve these forward-backward differential equationsfor the MFG model.

    Numerical results demonstrate that the proposed power controlpolicies offer reduced outage probability for the users of theSBSs when compared to the greedy power control policieswherein each SBS tries to obtain the target SINR of its userswithout considering the strategies of other SBSs.

    The rest of the paper is organized as follows. Section IIdescribes the system model and assumptions used in the paper.The formulation of the single-controller stochastic game modelfor multiple SBSs is presented in Section III. In Section IV,we derive the distributed power control policy using a MFGmodel when the number of SBSs increases asymptotically.Performance evaluation results are presented in Section Vbefore the paper is concluded in Section VI.

    II. SYSTEM MODEL AND ASSUMPTIONS

    A. Energy Harvesting Model

    We consider a single macrocell overlaid with M small cells.The downlink co-channel transmission of the MBS and SBSsis considered and it is assumed that each BS can serve only asingle user on a given transmission channel during a transmis-sion interval (e.g., time slot). The MBS uses a conventionalpower source and its transmit power level is quantized into adiscrete set of power levels P = {pmin0 , ..., pmax0 }, where thesubscript 0 denotes the MBS. This discrete model of transmitpower can also be found in [12]. On the other hand, theSBSs receive energy from a centralized energy queue (CEQ)which harvests renewable energies from the environment. Weassume that only the CEQ can store energy for future use andeach SBS must consume all the energy they receive from theCEQ at every time slot. The energy arrives at the CEQ inthe form of packets (one energy packet corresponds to oneenergy level in CEQ). The number of energy packet arrivals(t) during any time interval t is discrete and follows an

    arbitrary distribution, i.e., Pr((t) = X). We assume thatthe battery at the CEQ has a finite storage S. Therefore,the number of energy packet arrivals is constrained by thislimit and all the exceeding energy packets will be lost, i.e.,Pr((t) = S) = Pr((t) S). The statistics of energyarrival is known a priori at both the MBS and the CEQ. Attime t, given the battery level E(t), the number of energypacket arrivals (t), and the energy packets Q(t) that the CEQdistributes to the M SBSs, the battery level E(t + 1) at thenext time slot can be calculated as follows:

    E(t+ 1) = E(t)Q(t) + (t). (1)Given Q(t) energy packets to distribute, the CEQ will

    choose the best allocation method for the M SBSs ac-cording to their desired objectives. Denote slot durationas T and the volume of one energy packet as K, wehave the energies distributed to the M SBSs at time t as(p1(t)T, p2(t)T, , pM (t)T ) where pi(t) is the trans-mit power of SBS i at time t. Clearly we must have:

    Mi=1

    pi(t) =K

    TQ(t). (2)

    From the causality constraint, E(t) Q(t) 0, i.e., theCEQ cannot send more energy than that it currently possesses.Note that E(t) is the current battery level which is an integerand has its maximum size limited by S. Since the battery levelof CEQ and the number of packet arrivals are integer values,it follows from (1) that Q(t) is also an integer.

    Without a centralized CEQ-based architecture, each SBScan have different harvested energy and in turn battery levelsat each time slot, which will make this problem a multi-agent stochastic game [13]. Although this kind of game can beheuristically solved by using Q-learning [14], the conditionsfor convergence to a Nash equilibrium are often very strictand in many cases impractical. By introducing CEQ, the stateof the game is simplified into the battery size of CEQ, andthe multi-player game is converted into a two-player game.Another benefit of the centralized CEQ-based architectureis that the energy can be distributed based on the channelconditions of the users in SBSs so that the total payoff will behigher than the case where each SBS individually stores andconsumes the energy.

    All the symbols that are used in the system model andsection III are listed in Table I.

    B. Channel Model

    The received SINR at the user served by SBS i at time slott is defined as follows:

    i(t) =pi(t)gi,iIi(t)

    , (3)

    where Ii(t) =Mj 6=i

    pjgi,j + p0gi,0 is the interference caused by

    other BSs. gi,0 is the channel gain between MBS and the userserved by SBS i, gi,i represents the channel gain between SBSi and the user it serves, and gi,j is the channel gain betweenSBS j and the user served by SBS i. Finally, pi(t) represents

    ThucHighlight

    ThucHighlight

  • TABLE I: List of symbols used for the single-controller stochastic game model

    gi Average channel gain between BS i and its associated user gi,j Average channel gain between BS j and user of BS i0 (1) Target SINR for MBS (SBS) E(t) (Discrete) Battery level of CEQ at time t

    T Duration of one time slot in seconds Q(t) Number of quanta distributed by the CEQ at time tI0(t) Average interference at the user served by the MBS at time t Ii(t) Average interference at the user served by SBS i at time tS Maximum battery level of the CEQ P Finite set of transmit power of the MBS

    m, n Concatenated mixed-strategy vector for the MBS and theCEQ, respectively

    m(s), n(s) Probability mass function for actions of the MBS and theCEQ, respectively, when E(t) = s

    m(s, p) Probability that the MBS chooses power p P when E(t) = s n(s, i) Probability that the CEQ sends i quanta when E(t) = s(t) Energy harvested at time t pis Probability that the CEQ starts with battery level s Discount factor of the stochastic game U0, U1 Utility function of the MBS and the CEQ, respectively

    R0, R1 Payoff matrix for the MBS and the CEQ, respectively 0, 1 Discounted sum of the value function of the MBS and theCEQ, respectively

    the transmit power of SBS i at time t. The transmit powerof MBS p0(t) belongs to a discrete set {pmin0 , ..., pmax0 }.We ignore the thermal noise assuming that it is very smallcompared to the cross-tier interference.

    Similarly, the SINR at a macrocell user can be calculatedas follows:

    0(t) =p0(t)g0,0I0(t) +N0

    , (4)

    where I0(t) =Mi=1

    pig0,i is the cross-tier interference from

    M SBSs to the macrocell user, g0,0 denotes the channel gainbetween the MBS and its user, g0,i represents the channel gainbetween SBS i and macrocell user, and N0 is the thermalnoise.

    The channel gain gi,j is calculated based on path-loss andfading gain as follows:

    gi,j = |h|2ri,j , (5)where ri,j is the distance from BS j to user served by BSi, h follows a Rayleigh distribution, and is the path-lossexponent. We assume that the M SBSs are randomly locatedaround the MBS and the users are uniformly distributed withintheir coverage radii r.

    III. FORMULATION AND ANALYSIS OF THESINGLE-CONTROLLER STOCHASTIC GAME

    A stochastic game is a dynamic multiple stage game, whichchanges its form at each stage with some probability. In thisgame, the total payoff to a player is often the discounted sumof the stage payoffs or the limit inferior of the averages ofthe stage payoffs. The transition of the game at each timeinstant follows Markovian property, i.e., the current stage onlydepends on the previous one. Therefore, a stochastic game canbe viewed as a generalization of both repeated game and MDP.The single-controller game is a stochastic game where the stateof the game is decided by one player, namely, the controller.The other player can only influence the state indirectly throughthe controller (see [15] for the details).

    A. Utility Functions of MBS and SBSsFor a given MBS and CEQ, we formulate a two-player non-

    cooperative stochastic power control game, where the MBSand the SBSs try to maintain the average target SINR valuesof their users. As in [16], we can define the utility function ofthe MBS at time t as:

    U0(p0, Q, t) = (p0(t)g0 0(I0(t) +N0))2, (6)

    where I0(t) =Mi=1

    pi(t)g0,i is the average interference at the

    macrocell user at time t, and 0 is the target SINR of macrouser. Similarly, the utility function of the CEQ is defined asfollows:

    U1(p0, Q, t) = 1M

    Mi=1

    (pi(t)gi 1Ii(t))2, (7)

    where Ii(t) =Mj 6=i

    pj(t)gi,j + p0(t)gi,0 is the average interfer-

    ence at the user served by SBS i at time t. The arguments ofboth the utility functions demonstrate that the action at time tfor the MBS is its transmission power p0(t) while the actionof the CEQ is the number of energy packets Q(t) that is usedto transmit data from the SBSs. Later, in Remark 2, we willshow that the interference and transmit power of each SBScan be derived from Q and p0. The conflict in the payoffs ofboth the players arises from their transmit powers that directlyimpact the cross-tier interference.

    Note that the proposed single-controller approach can beextended to consider a variety of utility functions (averagethroughput, total network throughput, energy efficiency, etc.)

    B. Formulation of the Game Model

    Unlike a traditional power control problem the action spaceof the CEQ changes at each time and is limited by its batterysize. Given the distribution of energy arrival and the discountfactor , the power control problem can be modeled by usinga single-controller discounted stochastic game as follows: There are two players: one MBS and one CEQ. The state of the game is the battery level of the CEQ,

    which is {0, ..., S}. At time t and state s, the action p0(t) of the MBS is

    its transmission power and belongs to the finite set P ={pmin0 , ..., pmax0 }. On the other hand, the action of theCEQ is Q(t), which is the number of energy packetsdistributed to M SBSs. Q(t) belongs to the set {0, ..., s}.

    Let m and n denote the concatenated mixed-stationary-strategy vectors of the MBS and the CEQ, respec-tively. The vector m is constructed by concatenat-ing S + 1 sub-vectors into one big vector as m =[m(0),m(1), ...,m(S)] , in which each m(s) is a vectorof probability mass function for the actions of the MBSat state s. For example, if the game is in state s, m(s, p)gives the probability that the MBS transmits with powerp. Therefore, the full form of m will include the state s

  • and power p. However, to make the formulas simple, inthe later parts of the paper, we will use m or m(s) todenote, respectively, the whole vector or a sub-vector atstate s, respectively.

    Similarly, for the CEQ, n(s, i) gives the probability thatthe CEQ distributes i energy packets. Note that theavailable actions of the CEQ dynamically vary at eachstate whereas the available actions for the MBS remainunchanged at every state.

    Pay-offs: At state s, if the MBS transmits with power p0and the CEQ distributes Q energy packets, the payofffunction for the MBS is U0(p0, Q) while the payofffunction for the CEQ is U1(p0, Q). We omit t since tdoes not directly appear in U1 and U0.

    Discounted Pay-offs: Denote by the discount factor( < 1), then the discounted sum of payoffs of the MBSis given as:

    0(s,m,n) = limT

    Tt=1

    tE[U0(m,n, t)], (8)

    where E[U0(m,n, t)] is the average utility of macrouserat time t if the MBS and the CEQ are using strategy mand n, respectively. Similarly, we define the discountedsum of payoffs 1 at the CEQ. In [17, Chapter 2], it wasproven that the limit of 0 and 1 always exist whenT .

    Objective: To find a pair of strategies (m,n)such that 0 and 1 become a Nash equilib-rium, i.e., 0(s,m,n) 0(s,m,n) n N and 1(s,m,n) 1(s,m,n) m Mwhere M and N are the sets of strategies of MBS andCEQ respectively.

    Given the distribution of energy arrival at the CEQ, thetransition probability of the system from state s to state s

    under action Q (0 Q s) of the CEQ is given as follows:

    q(s|s,Q) =

    Pr( = s (sQ)), if s < S1

    SsX=0

    Pr( = X), otherwise.(9)

    The states of the game can be described by a Markovchain for which the transition probabilities are defined by (9).Clearly, the CEQ controls the state of the game while MBS hasno direct influence. Therefore the single-controller stochasticgame can be applied to derive the Nash equilibrium strategiesfor both the MBS and the CEQ. The two main steps to findthe Nash equilibrium strategies are: First, we build the payoff matrices for the MBS and the

    CEQ for every state s, where S s 0. Denote themby R0 and R1, respectively.

    Second, using these matrices, we solve a quadraticprogramming problem to obtain the Nash equilibriumstrategies for both the MBS and the CEQ.

    C. Calculation of the Payoff Matrices

    To build R0 and R1, we calculate U0 and U1 for everypossible pair (p0, Q), where p0 P and 0 s S. In this

    Fig. 1: Graphical illustration of the two BSs A, B, and the user D locatedwithin the disk centered at B.

    regard, we first derive the average channel gain gi,j . Second,from the energy consumed Q and transmission power p0 ofthe CEQ and the MBS, respectively, we decide how the CEQdistributes this energy Q among the SBSs. Then, we calculatethe transmit power at each SBS and obtain U0 and U1. Thenext two remarks provide us with the methods to calculate U0and U1.

    Remark 1. Given two BSs A and B, assume that a user D,who is associated with B, is uniformly located within the circlecentered at B with radius r (Fig. 1). Assume that A does notlie on the circumference of the circle centered at B and = 4.Denote AB = R and AD = d, then the expected value of d4,i.e., E[d4] is 1(R2r2)2 . If A B, then E[d4] = 1r

    2r2 given

    that r BD 1. For other values of , E[rij ] can be easilycomputed numerically using tools such as MATHEMATICA.

    Proof. See Appendix A.

    Recalling that gi,j = |h|2r4ij and that the fading and path-loss are independent. We have gi,j = E[h2]E[r4ij ] whereE[h2] = if h follows Rayleigh distribution with scaleparameter and E[r4ij ] can be calculated using the remarkabove. Next we need to find how the CEQ distribute its energyto each SBS such that U1 is maximized.

    Remark 2. (Optimal energy distribution at the CEQ) If attime t, the CEQ distributes Q energy packets to M SBSsand the MBS transmits with power p0, then the transmitpowers (p1, p2, ..., pM ) at the M SBSs are the solutions ofthe following optimization problem (t is omitted for brevity):

    maxp1,p2,...,pM

    1M

    Mi=1

    pigi 1( Mj 6=i

    pj gij + p0gi,0)

    2 ,s.t.

    Mi=1

    pi =K

    TQ,

    Pmax pi 0, i = 1, 2, ...,M,(10)

    where Pmax is the maximum transmit power of each SBS.Since this problem is strictly concave, the solution (p1, ..., pM )always exists and is unique for each pair (Q, p0). Thus,for each pair (Q, p0), where Q {0, ..., S} and p0 {pmin0 , ..., pmax0 }, we have unique values for U0(p0, Q) andU1(p0, Q).

    Based on the remarks above, for each combination ofQ and p0, we can find the unique payoff U0 and U1 of

    ThucHighlight

    ThucHighlight

    ThucHighlight

  • MBS and CEQ. Since Q and p0 belongs to discrete setswe can find the payoff for all of the possible combinationsbetween them. Thus, we can build the payoff matrix R0for the MBS and R1 for the CEQ. The matrix R0 has theform of a block-diagonal matrix diag(R00, ..., R

    S0 ), where each

    sub-matrix Rs0 = (U0(p0, j))P{0,...,s}, with p0 P and

    j {0, ..., s} is the matrix of all possible payoffs for theMBS at state s. Similarly, we can build R1, which is thepayoff matrix for the CEQ. A detailed explanation on howwe use them will be given in the next subsection.

    D. Derivation of the Nash Equilibrium

    If we know the strategy m0 of the MBS, the discount factor, and the probability pis that the CEQ starts with s energypackets in the battery, then the stochastic game is reducedto a simple MDP problem with only one player, the CEQ.For this case, denote the CEQs best response strategy to m0by n. Then the CEQs value function 1(s,m0,n), wheres = 0, ..., S, is the solution of the following MDP problem[17, Chapter 2]:

    min1

    Ss=0

    pis1(s,m0,n),

    s.t. 1(s,m0,n) r1(s,m0, j) + S

    s=0

    q(s|s, j)1(s,m0,n),

    s, j, 0 j s and 0 s S(11)

    with r1(s,m0, j) =p0P U1(p0, j)m0(s, p0) is the aver-

    age payoff for the CEQ at state s when it consumes j quantaof energy. Using the Dirac function , the dual problem canbe expressed as

    maxx

    Ss=0

    sj=0

    r1(s,m0, j)xs,j ,

    s.t.Ss=0

    sj=0

    [(s s) q(s|s, j)]xs,j = pis , 0 s S,

    xs,j 0 s, j, 0 j s and 0 s S(12)

    where (s) = 1 if s = 0 and (s) = 0 otherwise.By solving the pair of linear programs above, the probability

    that the SBS chooses action j at state s can be found asn(s, j) =

    xs,jsj=0 xs,j

    . Using some algebraic manipulations, wecan convert the optimization problem in (12) into a matrixform as:

    min1

    piT1,

    s.t. H1 RT1m0, (P)and its dual as

    maxx

    mT0R1x,

    s.t. xTH = piT,x 0, (D)

    where R1 is the payoff matrices of the CEQ. Combining theprimal and dual linear programs (i.e., (P) and (D) above) andusing the same notations, we have the following theorems.

    Theorem 1 (Nash equilibrium strategies [15]). If the statespace and the action space are finite and discrete, and thetransition probabilities are controlled only by player 2 (i.e.,the CEQ), then there always exists a Nash equilibrium point(m,n) for this stochastic game. Moreover, a pair (m,n) isa Nash equilibrium point of a general-sum single-controllerdiscounted stochastic game if and only if it is an optimalsolution of a (bilinear) quadratic program given by

    maxm,x,1,

    [m(R0 +R1)x piT1 1T],s. t. H1 RT1m,

    xTH = piT,

    Rs0x(s) s1, s = 0, ..., S,m(s)T111 = 1, s = 0, ..., S,m,x 0,

    (13)

    where s is the maximum average payoff of the MBS at state s.The sub-vector strategy n(s) of the CEQ at state s is calculatedfrom x as:

    n(s) =x(s)

    x(s)T1. (14)

    We can define different utility functions for the MBS andthe SBS and apply the same method to achieve the Nashequilibrium. As long as the number of states is finite and thetransition and the payoff matrices remain unchanged over time,a Nash equilibrium point always exists.

    Theorem 2 (Best response strategy for the MBS). Given astationary strategy n of the CEQ, there exists a pure stationarystrategy m as the best response for the MBS. Similarly, forany stationary strategy m of the MBS, there exists a purestationary best response n of the CEQ.

    Proof. See Appendix B.

    Because for any mixed strategy of CEQ, MBS can find apure stationary strategy as its best response, we only needto find Nash equilibrium where the strategy of MBS isdeterministic. This problem can be converted to a mixed-integer program with m as a vector of 0 and 1. We can usea brute-force search to obtain an equilibrium point. For eachfeasible integer value of m we insert it into (13) to obtain n.If the objective is zero, then (m,n) is the equilibrium point.This theorem implies that the optimization problem in (13)can be solved in a finite amount of time.

    Notice that there can be multiple Nash equilibrium. There-fore, to make the chosen Nash equilibrium point more mean-ingful, we use the following lemma from [15].

    Lemma 1. (Necessary and sufficient conditions for the Nashequilibrium) m and n constitute a pair of Nash equilibriumpolicies for the MBS and the CEQ if and only if

    m(R0 +R1)x piT1 1T = 0. (15)

  • Since pis is the probability that the CEQ starts with senergy level in the battery at starting time, from (11), piT1 isthe average payoff of the CEQ with respect to the strategiesm,n. Using the lemma above, we change the problem in (13)to a quadratic-constrained quadratic programming (QCQP) asstated below.

    Proposition 1. (Nash equilibriums that favor the SBSs) TheNash equilibrium (m,n) that has the best payoff for the CEQis a solution of the following QCQP problem:

    maxm,x,1,

    piT1,

    s.t. m(R0 +R1)x piT1 1T = 0,all constraints from (13).

    (16)

    By solving this QCQP, we obtain a Nash equilibrium thatreturns the best average payoff for the CEQ. This bias towardthe SBSs is crucial as the available energy of the CEQ islimited by the randomness of the environment and thus theSBSs are more likely to suffer when compared to the MBS.Again, there can be more than one Nash equilibrium, but allof them must return the same payoff for CEQ. Because theremay be multiple solutions, CEQ and MBS need to exchangeinformation so that they agree on the same Nash equilibrium.

    Algorithm 1 Nash equilibrium for the stochastic game1: The MBS and the CEQ build their reward matrices R0

    and R1. For each possible pair of energy level and transmitpower (Q, p0), the CEQ solves (2) to obtain a unique tuple(p1, p2, ..., pM ) and record these results.

    2: The MBS and the CEQ calculate their strategy m and n,respectively, by solving (16).

    3: At time t, the CEQ sends its current battery level s tothe MBS. It also randomly chooses an action Q using theprobability vector n(s).

    4: The MBS then randomly picks power p0 using distribu-tion m(s) and sends it back to the CEQ. Based on p0and Q, the CEQ searches its records and retrieves thecorresponding tuple (p1, ..., pM ).

    5: The CEQ distributes energy (p1T, p2T, ..., pMT ),respectively, to the M SBSs.

    From Theorem 2, we know that there exist an equilibriumwith pure stationary strategies for both MBS and CEQ. Recallthat with pure strategy, the action of each player is a functionof the state. Thus, if we can obtain this equilibrium, CEQ canpredict which transmit power p0, the MBS will use based onthe current state without exchanging information with MBSand vice versa.

    E. Implementation of the Discrete Stochastic Game

    For discrete stochastic control game with CEQ, each SBSfirst needs to send its location and average fading channelinformation E[h2] of its user to the CEQ and the MBS so thatcomplete channele state information is known at both the MBSand CEQ. Since we only use average channel gain, the CEQand the MBS only need to re-calculate the Nash equilibriumstrategies when either the locations of SBSs change, e.g., some

    SBSs go off and some are turned on, or when the averageof channel fading gain h changes, or when distribution ofenergy arrival (t) at the CEQ is changes. Thanks to thecentral design, SBSs and MBS only need to send informationabout their channel gains to the CEQ to calculate the Nashequilibrium at the beginning. Later, for each time slot, onlyCEQ and MBS need to exchange information, therefore theoverhead for communications is small.

    IV. MEAN FIELD GAME (MFG) FOR LARGE NUMBER OFSMALL CELLS

    The main problem of the two-player single-controllerstochastic game is the curse of dimensions. The timecomplexity of Algorithm 1 increases exponentially with thenumber of states S or the maximum battery size. Note thatR0 and R1 have dimensions of |P| S(S + 1)/2, so thecomplexity increases proportionally to S. Moreover, unlikeother optimization problems, we are unable to relax the QCQPin (16), because Theorem 1 states that the Nash equilibriummust be the global solution of the quadratic programming (13).To tackle these problems, we extend the stochastic game modelto an MFG model for very large number of players.

    The main idea of an MFG is the assumption of similarity,i.e., all players are identical and follow the same strategy.They can only be differentiated by their state vectors. Ifthe number of players is very large, we can assume that theeffect of a specific player to other players is nearly negligible.Therefore, in an MFG, a player does not care about othersstates but only act according to a mean field m(t, s), whichusually is the probability distribution of state s at time instantt [18]. In our energy harvesting game, the state is the batteryE and the mean field m(t, E) is the probability distributionof energy at time t in the area we are considering. Whenthe number of players M is very large, we can assume thatm(t, E) is a smooth continuous distribution function. We willshow that the average interference at a SBS as a function ofthe mean field m.

    All the symbols that are used in this section are listed inTable II.

    A. Formulation of the MFG

    Denote by E(v) the available energy in the battery of anSBS at time v. Given the transmission strategies of other SBSs,each SBS will try to maximize its long run generic utility valuefunction by solving the following optimal control problem:

    minp

    U(0, E(0)) = E

    [ T0

    (p(v,E(v))g (I(v) +N0))2dv],

    (17)s.t. dE(v) = p(v,E(v))dv + dWv, (18)

    E(v) 0, p(v) 0, (19)where I(v) is the generic interference at a user served by anSBS at time v and g is the channel gain between a genericSBS and its user. The mean field m(v,E) is the probabilitydistribution of energy E in the area at time v. Using M as

    ThucHighlight

    ThucHighlight

    ThucHighlight

  • TABLE II: List of symbols used for the MFG game model

    g Average channel gain from a generic SBS to another user p(t, R) Transmit power at a generic SBS as a function of RE (Continuous) Battery level of an SBS m(t, E) Probability distribution of energy E at time tR Energy coefficient eR = E m(t, R) Probability distribution of energy coefficient R at time tWt Wiener process at time t p(t) Average transmit power of a SBS at time t

    p(t, E) Transmit power at a generic SBS as a function of E (or R) Intensity of energy arrival or loss

    the number of SBSs in a macrocell and assuming that theother SBSs have the same average channel gain g to the userof the current generic SBS, then, the average interferenceI(v) at the user served by a generic SBS can be expressedas I(v) = Mgp(v), where p(v) =

    0p(v,E)m(v,E)dE

    can be understood as the average transmit power of anothergeneric SBS. Since the MFG assumes similarity, p(v) can beconsidered as the average transmit power of a generic SBS attime v. To make the notation simpler, we denote = gM .

    Thanks to similarity, all the SBSs have the same set ofequations and constraints, so the optimal control problem forthe M SBSs reduces to finding the optimal policy for only onegeneric SBS. Mathematically, if an SBS has infinite availableenergy, i.e., E(0) = , it will act as an MBS. However, forsimplicity, we will assume that only the SBSs are involvedin the game and the interference from the MBS is constant,which is included in the noise N0 as in [19]. Except that, thesystem model and the optimization problem here are similarto the case with the discrete stochastic game model.

    Assuming that the SBSs are uniformly distributed withinthe macrocell with radius r centered at the MBS, the averageinterference from the MBS to a generic user served by anSBS can be easily derived by using a method similar to thatdescribed in Remark 1. For the MFG model, the energy levelE is a continuous non-negative variable. The equality in (18)shows the evolution of the battery, where is a constantwhich is proportional to the maximum energy arrival duringa time interval. Wv is a Wiener process, thus dWv = vdv,where v is a Gaussian random variable with mean zero andvariance 1. This model of evolution for battery energy wasmentioned in [20]. The inflexibility of the energy arrival isthe main disadvantage of using the MFG model comparedto the discrete stochastic game model. The random arrival ofenergy is configured as noise, so this can be either positiveor negative. We can consider the negative part as the batteryleakage and internal energy consumption. The final inequali-ties are the causality constraints: The battery state E(v) andtransmit power must always be non-negative. To guarantee thispositivity we follow [21] and change the energy variable E(v)to E(v) = eR(v). This conversion is a bijection map fromE(v) to R(v), thus we can write m(v,E) = m(v,R) andp(v,E) = p(v,R), where > R > . The new optimalcontrol problem can be rewritten as

    minp(.)

    U(0, R(0)) = E

    [ T0

    (p(v,R(v))g p(v) N0)2dv],

    (20)

    s.t. dR(v) = p(v,R(v))eR(v)dv + eR(v)dWv, (21)p(v) 0. (22)

    To obtain the power control policy p, first, we derive the

    Forward-Backward differential equations from the above prob-lem. Then, we apply Finite Difference method to numericallysolve these equations,

    B. Forward-Backward Equations of MFGLets assume the optimal control above starts at time t with

    T t 0, we obtain the Bellman function U(t, R) as

    U(t, R(t)) = E

    [ Tt

    (p(v,R(v))g p(v) N0)2dv].

    (23)From this function, at time t, we obtain the followingHamilton-Jacobi-Bellman (HJB) [21] equation:

    minp0

    {(p(t, R)g p(t) N0

    )2 p(t, R)eRRU(t, R)}+tU +

    2

    2e2R2RRU = 0,

    (24)

    where p(t) = e

    Rp(t, R)m(t, R)dR is the averagetransmit power at a generic SBS. The Hamiltonianminp0

    {(p(t, R)g p(t) N0

    )2 p(t, R)eRRU(t, R)}is given by the Bellmans principle of optimality. By applyingthe first order necessary condition, we obtain the optimalpower control as follows:

    p(t, R) =[p(t) + N0

    g+eRRU

    2g2

    ]+. (25)

    Remark 3. The Bellman U , if exists, is a non-increasingfunction of time and energy. Therefore, we have RU 0and tU 0.

    From equation (25), given the current interference p(t) +N0 at a user, the corresponding SBS will transmit less powerbased on the future prospect e

    RRU2g2 . If the future prospect is

    too small, i.e., eRRU

    2g2 < p(t)+N0g , it stops transmissionto save energy.

    Replacing p back to the HJB equation we have

    tU +2

    2e2R2RRU + (p(t) + N)

    2([p(t) + N +

    eRRU2g

    ]+)2= 0,

    (26)

    which has a simpler form as follows:

    tU +2

    2e2R2RRU = (pg)

    2 (p(t) + N)2. (27)Also, from (21), at time t, we have the Fokker-Planck equation[18] as:

    tm(t, R) = R(peRm) +

    2

    2RR(me

    2R), (28)

  • where m(t, R) is the probability density function of R attime t. Combining all these information we have the followingproposition.

    Proposition 2. The value function and the mean field (U,m)of the MFG defined in (20) is the solution of the followingpartial differential equations:

    tU +2

    2e2R2RRU = (pg)

    2 (p(t) + N0)2, (29)

    p(t, R) =

    [p(t) + N0

    g+eRRU

    2g2

    ]+,

    (30)

    tm(t, R) =R(peRm) +

    2

    22RR(e

    2Rm),

    (31)

    p(t) =

    eRp(t, R)m(t, R)dR, (32)

    m(t, R)dR =1, where m(t, R) 0. (33)

    Lemma 2. The average transmit power p(t) of a generic SBSis a derivative with respect to time of the average availableenergy in the battery and can be calculated as

    p(t) = ddt

    e2Rm(t, R)dR. (34)

    Proof. See Appendix C.

    Since p is always non-negative, the average energy in anSBSs battery is a decreasing function of time. That meansthe distribution m should shift to the left when t increases.This is because we use the Wiener process in (18). Since dWthas a normal distribution with mean zero, the energy harvestedwill be equal to the energy leakage. Therefore, for the entiresystem, the total energy will decrease over time.

    Lemma 3. If (U1,m1) and (U2,m2) are two solutions ofProposition 2 and m1 = m2, then we have U1 = U2.

    Proof. First, from (21) we derive Fokker-Planck equation:

    tm1(t, R) = R(p1eRm1) +

    2

    22RR(e

    2Rm1),

    tm2(t, R) = R(p2eRm2) +

    2

    22RR(e

    2Rm2).

    Since m1 = m2 = m, we subtract the first equation fromthe second one to obtain R((p1 p2)eRm) = 0. Thismeans (p1 p2)eRm is a function of t. Let us denotef(t) = (p1 p2)eRm, then we have (p1 p2)m = f(t)eR.From Lemma 2, p is a function of m, thus p1(t) = p2(t).Since p(t) =

    eRpmdR, we have

    eRp1mdR =

    eRp2mdR

    eR(p1 p2)mdR = 0, t. (35)

    Now we substitute (p1 p2)m = f(t)eR that results in f(t)dR = 0 t. This means f(t) = 0 or p1 = p2.

    Note that U is a function of p and p. Since p1 = p2 andp1 = p2, it follows that U1 = U2. This lemma confirms thatan SBS will act only against the mean field m. Thus m is theone that determines the evolution of the system. Two systemswith the same mean field will behave similarly.

    C. Solving MFG Using Finite Difference Method (FDM)

    To obtain U and m, we use the finite difference method(FDM) as in [21] and [27]. We discretize time and en-ergy coefficient R into large intervals as [0, ..., Tmaxt] and[RmaxR, ..., RmaxR] with t and R as the stepsizes, respectively. Then U,m, p become matrices with sizeTmax (2Rmax + 1). To keep the notations simple, we uset and R as the index for time and energy coefficient in thesematrices with t {0, ..., Tmax} and R {Rmax, ..., Rmax}.For example, m(t, R) is the probability distribution of energyeRR at time tt. Using the FDM, we replace RU , tU ,and 2RRU with the discrete formula as follows [28]:

    tU(t, R) =U(t+ 1, R) U(t, R)

    t, (36)

    RU(t, R) =U(t, R+ 1) U(t, R 1)

    2R, (37)

    2RRU(t, R) =U(t, R+ 1) 2U(t, R) + U(t, R 1)

    (R)2. (38)

    By using them in (29) and after some simple algebraic steps,we have

    U(t 1, R) = U(t, R) + e2R 2t

    2(R)2A1 tB1, (39)

    where

    A1 = U(t, R+ 1) 2U(t, R) + U(t, R 1),B1 = (p(t, R)g)

    2 (p(t) + N)2 .Similarly, discretizing (32), we have

    m(t, R) =t

    2RA2 +

    2t

    2(R)2B2 +m(t 1, R), (40)

    where

    A2 =e(R+1)Rp(t 1, R+ 1)m(t 1, R+ 1) e(R1)Rp(t 1, R 1)m(t 1, R 1), (41)

    B2 =e2(R+1)Rm(t 1, R+ 1) 2e2RRm(t 1, R)+ e2(R1)Rm(t 1, R 1). (42)

    To obtain U,m, p, and p using Proposition 2, we need tohave some boundary conditions. First, to find m, we assumethat there is no SBS that has the battery level equal to or largerthan eRmaxR so that m(t, Rmax) = 0, t. This is true ifwe assume that e(Rmax1)R is the largest battery size of anSBS. Also, when R = Rmax, from the basic property ofprobability distribution

    RmaxR=Rmax

    m(t, R)R = 1

    m(t,Rmax) = 1R

    RmaxR=Rmax+1

    m(t, R).

    (43)

  • Next, to find U , again we need to set some boundaryconditions. Notice that U(Tmax, R) = 0 for all R. We furtherassume the following: Intutitively, if the battery level of a SBS is full, i.e., whenR = Rmax, this SBS should transmit something (becausethe thermal noise N > 0), or equivalently, p(t, Rmax) >0. That means

    p(t, Rmax) =p(t) + N

    g+eRmaxRRU(t, Rmax)

    2g2.

    (44)Therefore, if we know U(t, Rmax 1) and p, we cancalculate U(t, Rmax).

    Similarly, it must be true that when available energyis 0, i.e., R = Rmax, a SBS will stop transmission.Therefore, we can assume

    p(t) + N

    g+eRmaxRRU(t,Rmax)

    2g2= 0. (45)

    Again, if we know U(t,Rmax + 1), we can calculatethe boundary U(t,Rmax).

    During simulations, in some cases when the densityis very high, we obtain very large (unrealistic) valuesof transmit power. Therefore, we must put an extraconstraint for the upper limit. In this paper, we useE(t) > p(t, E(t))T , or eR(t)R p(t, R(t))T ,where T is the duration of one time slot. This meanswe have to limit the transmit power during one time stept to be smaller than the maximum power that can betransmitted during one time interval T .

    Based on the above considerations, we develop an iterativealgorithm (Algorithm 2) detailed as follows.

    D. Implementation of MFG

    For the MFG, we do not need the location information foreach SBS. However, we need information about the averagechannel gain g, g, the number of SBSs M in one macrocell,and the initial distribution m0 of the energy of SBSs in themacrocell. Therefore, some central system should measurethese information, solve the differential equations, and thenbroadcast the power policy p to all the SBSs. It is moreefficient than broadcasting all the information to all SBSs andlet them solve the differential equations by themselves. Again,the central system only needs to re-calculate and broadcast toall SBSs a new power policy if there are changes in g, g, orM .

    V. SIMULATION RESULTS AND DISCUSSIONS

    A. Single-Controller Stochastic Game

    In this section, we quantify the efficacy of the developedstochastic policy in comparison to the greedy power controlpolicy. The stochastic policy is obtained from the QCQPproblem. On the other hand, in the greedy policy, we followa hierarchical method. First, the MBS chooses its transmitpower, then each SBS tries to transmit with the power suchthat the SINR at its user is nearest to its target value, ignoringthe co-interference from other SBSs. Next, the MBS records

    Algorithm 2 Iterative algorithm for FDMInitialize input

    Set up Tmax (2Rmax + 1) matrices U , m, p, andT 1 vector p.

    Guess arbitrarily initial values for power p, i.e.,p(t, R) = eRR.

    Initialize i = 1, U(Tmax, .) = 0, m(0, .) = m0(.),m(t, Rmax) = 0, and p(t, 0) = 0.

    Initialize R and t as the step size of energy andtime with (R)2 > t.

    Set MAX as the number of iterationSolve PDEs with FDMwhile i < MAX do:

    Solve the Fokker-Planck equation to get m using(40) and (43) with given p, m0.

    Update p(t) for Tmax t 0 using discrete formof equation (32).

    Calculate U for all t < Tmax by using (39), (44)and (45) with p, p.

    Calculate new transmission power pnew using (30)Regressively update p = ap+bpnew with a+b = 1.for R {Rmax, ..., Rmax}

    if p(t, R) > eRR

    T then p(t, R) =eRR

    T .endi i+ 1.

    endLoop in Tmax time slots

    At time slot t, SBS with energy battery eRR transmitswith power p(t, R).

    its current interference and then chooses a new transmit powerto achieve the target SINR at its user and so on.

    To solve the QCQP in (16), we use the fmincon functionfrom Matlab. In the simulations, the CEQ has a maximumbattery size of S = 21 and the volume of one energy packetis K = 25J. The duration of one time interval is T = 5 msand the thermal noise is N0 = 108 W. The MBS has twolevels of transmission power [10; 20] W and the SINR outagethreshold is set to 5. The energy arrival at each SBS follows aPoisson distribution with unit rate. The volume of each energypacket arriving at the CEQ is C times larger than the energypacket collected by each SBS. Thus, the amount of energy ineach packet at the CEQ will be CK J. This shows that theCEQ should have a more efficient method to harvest energythan each SBS (in the case of greedy method). However, asthe maximum battery size of the CEQ is S, its total availableenergy is always limited by the product SCK J regardlessof M . For both the cases, each SBS can receive up to 150J of energy from either the CEQ or the environment. In thebeginning, the CEQ is assumed to have full battery.

    From Fig. 2, it can be seen that when the number of SBSs issmaller than some value, the stochastic method achieves betterresults. That is because, the CEQ can share energy among theSBSs, and also the QCQP in (16) gives a Nash equilibrium thatfavors the CEQ. However, at some point, the greedy methodwill provide better results. This is not surprising since the CEQ

  • 20 30 40 50 60 70 80 90 100

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Number of SBSs

    Out

    age

    prob

    abilit

    y

    Greedy MethodStochastic Method

    Fig. 2: Outage probability of a small cell user with different numberof SBSs when S = 21 states, C = 40, 1 = 0.002, 0 = 10.

    1 1.5 2 2.5 3 3.5 4 4.5 5x 103

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Target SINR of SBSs

    Out

    age

    prob

    abilit

    y

    Stochastic MethodGreedy Method

    Fig. 3: Outage probability of a small cell user with different targetSINR when S = 21 states, C = 40, M = 30 SBSs, 0 = 10.

    can only store at most S K C J of energy. Therefore,when the number of SBSs increases, the allocated power perSBS by the CEQ reduces to zero while the greedy methodallows each SBS to harvest up to 150 J no matter how largeM is. This means the greedy method will provide a betterperformance compared to using the CEQ when M is large.

    Following Fig. 3, by increasing the threshold SINR target1 we can reduce the outage probability of a user served byan SBS. This is understandable since the average SINR willincrease to approach the higher target and thus reduce theoutage probability. However, for both the greedy and stochasticmethods, the slope is nearly flat when the target SINR islarger than some value. This is because, to increase the averageSINR, the SBSs need to transmit with higher power to mitigatecross-tier interference. However, since the battery capacity islimited for the CEQ and each of the SBSs, a higher transmitpower means a higher consumption of harvested energy, whichcan cause a shortage of energy later. Thus at some pointincreasing the target SINR does not bring any benefits.

    Fig. 4 shows the outage probability when increasing thequanta volume by choosing a higher multiplier C for the CEQ.It is easy to see that, with a higher C, i.e., choosing a moreeffective method to harvest energy at the CEQ, we can achievea better performance. The greedy method does not use theCEQ, so the outage probability remains unchanged. Note that,since the battery size of each SBS is limited to 150 J , at some

    point, a higher C does not improve the outage probability.

    40 50 60 70 80 90 100

    0.3

    0.4

    0.5

    0.6

    0.7

    Volume of one energy packet

    Out

    age

    prob

    abilit

    y

    Greedy MethodStochastic Method

    Fig. 4: Outage probability of a small cell user with different quantavolume when S = 21 states, M = 30 SBSs, 1 = 0.002, 0 = 10.

    1 2 3 4 5x 103

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Target SINR of SBSs

    Out

    age

    prob

    abilit

    y

    Stochastic methodGreedy method

    Fig. 5: Outage probability of a macrocell user when S = 21 states,M = 80 SBSs, C = 50, 0 = 10.

    Fig. 5 shows the outage probability of the macrocell userwhen M = 80 and C = 50. The stochastic method givesbetter results in this case since the SBSs are more rationalin choosing their transmit powers. Also, unlike the greedymethod, the CEQ has a fixed-energy battery, so when M islarge, the average amount of energy distributed to an SBSwill be small, which in turn limits the cross-interference tothe MBS. With the greedy method, the SBSs use highertransmit power to compete against the MBS; therefore, itcreates a larger cross-interference and in turn increases theoutage probability of MBS.

    In summary, we see that the centralized method using aCEQ can provide a better performance in terms of outageprobability for both the MBS and the SBSs. However, since theCEQ has a fixed battery size, the centralized method performspoorer when it needs to support a large number of SBSs. Toimprove this inflexibility, we can adjust other parameters asfollows: change target SINR, increase the multiplier C, orincrease the battery size of each SBS.

    B. Mean Field Game

    We assume that the transmit power at the MBS is fixedat 10W and it results in a constant noise at the user servedby a generic SBS. The radius of the macrocell is r = 1000meter, so we have constant cross-interference N0 = 105 W.

  • The target SINR is = 0.002 and assume that g = g =0.001. We discretize the energy coefficient R into 80 intervals,i.e., Rmax = 40 and Tmax = 1000 intervals. Similar to thediscrete stochastic case, each SBS can hold up to 150 J inthe battery, so the maximum transmit power is 30 mW. Weimpose the threshold such that an SBS will not transmit atR = Rmax = 40 or E = 0.6 J. The intensity of energyloss/energy harvesting, is 1.

    Fig. 6: Energy distribution over time when M = 400 SBSs/cell.

    1 1.2 1.4 1.6 1.8 2 2.2 2.4x 104

    0

    0.5

    1

    1.5

    2

    2.5

    Energy (Joules)

    Prob

    abilit

    y di

    strib

    utio

    n

    t=0t=1st=2.5st=5s

    Fig. 7: Energy distribution over time when M = 400 SBSs/cell.

    For M = 400 SBSs/cell, we have g = 0.001 > =gM = 0.0008, so a generic SBS does not need to use a largeamount of power in order to obtain the target SINR. Notice thatp is the average transmit power of a generic SBS. Therefore,if a generic SBS reduces p, the cost term p also reduces.Thus the difference between the cost and the received powerpg will be smaller, which is desirable. It makes sense that ageneric SBS will try to reduce its power as much as possible inthis case. The power cannot be zero though, because N0 > 0.Moreover, from Fig. 8 and Fig. 9, we see that, at the beginning,the SBS with higher energy (i.e., 100 J) will transmit witha high power and will gradually reduce to some value. TheSBSs with smaller battery will increase their power gradually.Since the transmit power is small, we see that in Fig. 6 andFig. 7, the energy distribution shifts to the left slowly.

    On the other hand, when M = 500 SBSs/cell, we haveg = = 0.001. In this case, the effect is more complicatedbecause reducing the transmit power may not reduce the gapbetween the received power pg and the cost term p + N .

    Fig. 8: Transmit power to serve a generic user using MFG when M = 400SBSs/cell.

    0 1 2 3 4 52

    4

    6

    8

    10

    12

    14

    16x 105

    Time (sec)

    Tran

    smitt

    ed p

    ower

    (Watt

    s)

    E=70 microJE=75 microJE=80 microJE=100 microJ

    Fig. 9: Transmit power for different energy levels when M = 400SBSs/cell.

    Again, as can be seen from Fig. 11, the SBSs with largeravailable energy will transmit with large power first and aftersometime when there is less energy available in the system,all of them start to use less power. Therefore, as can be seenin Fig. 10, the energy distribution shifts toward the left witha faster speed than the previous case.

    For M = 600 SBSs/cell, we have g < . This means eachSBS needs to transmit with a power larger than the average pto achieve the target SINR. In Fig. 12, we see that the behaviorof each SBS is the same as in the previous case. That is, theSBSs with higher energy transmit with larger power first andthen reduce it, while the poorer SBSs increase their transmitpower over time.

    We compare the MFG model against the stochastic discretemodel for different values of M . For simplicity, we assumethat each SBS has the same link gain to its user as g = 0.001.Also, we assume that the channel gain from each SBS tothe user of another SBS is g = 0.001. Using Remark 2,it can easily be proven that in this case, each SBS willtransmit with the same power, i.e., if the CEQ sends QCKJ of energy to M SBSs, then each SBS receives QCK/MJ. Then, the interference at each SBS will be calculatedas g(M1)g+N0MT/(QCK) with multiplier C = 20 and the

  • 1 1.2 1.4 1.6 1.8 2 2.2 2.4x 104

    0

    0.5

    1

    1.5

    2

    2.5

    Energy (Joules)

    Prob

    abilit

    y di

    strib

    utio

    n

    t=0t=1st=2.5st=5s

    Fig. 10: Energy distribution over time when M = 500 SBSs/cell.

    0 1 2 3 4 50

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    0.018

    0.02

    Time (sec)

    Tran

    smitt

    ed p

    ower

    (Watt

    s)

    E=70 microJE=75 microJE=80 microJE=100 microJ

    Fig. 11: Transmit power for different energy levels when M = 500SBSs/cell.

    maximum battery size of the CEQ as S = 101. Because theMBS is not a player of the game, the simulation step becomessimpler, and we only need to solve a linear program for theMDP problem instead of a QCQP. Therefore, we can call itas MDP method to accurately reflect the difference.

    For the discrete stochastic case, we discretize the Gaussiandistribution to model the energy arrivals at the CEQ. Thebattery size of each SBS is still 150 J . The average SINR ofa generic small cell user using both MFG and MDP modelswith different density is plotted in Fig. 13. We see that usingthe MFG model, the average SINR increases at the beginningand then it starts falling at some point. This is because, whenthe density is low, the interference from the MBS is noticeable(i.e., 105 W in our simulation). From the previous figures,it can be seen that an SBS will increase its power whenthe density is higher. Therefore, after some point the co-tierinterference becomes dominant and the average SINR willbegin to drop. It means at some value of the density, e.g.,M = 400 SBSs/cell in Fig. 13, we obtain the optimal averageSINR. We notice that the MFG model performs better thanthe MDP model with the CEQ. This is due to limited batterysize of CEQ and nearly same channel gains of all SBS usersin a dense network scenario.

    In summary, we have two important remarks for the MFGmodel. First, if the density of small cells is high, the SBSswill transmit with higher power. Second, from Fig. 13, we see

    0 1 2 3 4 50.005

    0.01

    0.015

    0.02

    0.025

    0.03

    Time (sec)

    Tran

    smitt

    ed p

    ower

    (Watt

    s)

    E=70 microJE=90 microJE=110 microJE=150 microJ

    Fig. 12: Transmission power over time when M = 600SBSs/cell.

    100 200 300 400 500 600 700 800 900 10000

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5x 103

    Number of SBSs per macro cell

    Aver

    age

    SINR

    val

    ue

    MFG methodMDP methodTarget SINR

    Fig. 13: Average SINR at a generic SBS.

    that by choosing a suitable density of SBSs we can obtain thehighest average SINR. Notice that from (30), it can be easilyproven that the average SINR at a user served by an SBS willalways be smaller than the target SINR (because RU < 0).Therefore, the highest average SINR is also the closest to thetarget SINR, which is our objective in the first place.

    VI. CONCLUSION

    We have proposed a discrete single-controller discountedtwo-player stochastic game to address the problem of powercontrol in a two-tier macrocell-small cell network under co-channel deployment where the SBSs use stochastic renewableenergy source. For the discrete case, the strategies for boththe MBS and SBSs have been derived by solving a quadraticoptimization problem. The numerical results have shown thatthese strategies can perform well in terms of outage probabilityexperienced by the users. We have also applied a mean fieldgame model to obtain the optimal power for the case whenthe number of SBSs is very large. We have also discussed theimplementation aspects of these models in a practical network.

    APPENDIX A

    Denote the distance between the SBS B to its user D asBD = a. If D is uniformly located inside the disk centred at

  • B, the PDF of BD is fD(BD = a) = 2ar2 . Denote by thevalue of the angle ABD, is uniformly distributed between(0, 2pi). Using the cosine law d2 = R2 +a2 2aR cos(), weobtain

    E[d4] = 2pi

    0

    r0

    (R2 + a2 2aR cos )2 12pi

    2a

    r2da d.

    (A-1)

    First, we solve the indefinite integral over as

    (R2 + a2 2aR cos )2d = f1(a, ) + f2(a, ) + L, where L is aconstant and

    f1(a, ) =2(R2 + a2)

    (R2 a2)3 arctan(R+ a) tan 2

    R a , and

    f2(a, ) =2aR sin (R2 + a2 2aR cos )

    (R2 a2)2 .

    Since sin 0 = sin 2pi = 0, after integrating f2 over [0, 2pi], wecan ignore it. Thus 2pi

    0

    (R2 + a2 2aR cos )2d = pi 2(R2 + a2)

    (R2 a2)3 .

    Next, we integrate the above result over a to obtain theindefinite integral as:

    1

    r2

    a

    2(R2 + a2)

    (R2 a2)3 da =1

    r2a2

    (R2 a2)2 + L. (A-2)

    Applying the upper and lower limits of a, we complete theproof.

    APPENDIX BPROOF OF THEOREM 2

    First we prove that given n, there exists a pure stationarystrategy m which is the best response of the MBS against n.Since the action set of the MBS is fixed, at each state s, givenstrategy n(s) of the CEQ, the MBS just needs to choose amixed stationary strategy m(s) such that its average payoffis maximized. At state s, the average utility function of theMBS is

    E[U1] =ps0P

    sj=0

    (ps0g0 0I(ps0, j))2m(s, ps0)n(s, j),

    (B-1)where ps0 and j {0, ..., s} are the transmit power of theMBS and the number of energy packets distributed at the CEQat state s, respectively. I(ps0, j) is the average interferencefrom other SBSs to the MBS if the CEQ distributes j energypackets and the MBS transmits with power ps0. We have

    I(ps0, j) =Mi=1

    pig0,i+N,where (p1, p2, ..., pM ) is the solution

    of Remark 2, with Q and p0 replaced by j and ps0,respectively. Since

    ps0Pm(s, p

    s0) = 1 and ms is a non-

    negative vector, we have

    E[U1] maxps0P

    sj=0

    (ps0g0 0I(ps0, j))2n(s, j) . (B-2)

    Since the set P is fixed and finite, there always exists atleast one value of ps0 that achieves the maximum for the right

    hand side. That means when the game in state s, the MBScan choose this power level with probability of 1. However,obtaining a closed-form ps0 is difficult because first we needto find (p1, ..., pM ) in closed-form by solving (10).

    Nevertheless, if the average channel gains from each SBSto the macrocell user (say g0,SBS ) are same, we can obtain

    ps0 in closed form by defining I(ps0, j) =

    Mi=1

    pig0,SBS+N0 =

    g0,SBSMi=1 pi +N0 =

    KT jg0,SBS +N0. The final equality

    is from (2). Replacing this result back into (B-2), we have

    E[U1] maxps0P

    sj=0

    (ps0g0 0(K

    Tjg0,SBS +N0))

    2n(s, j).

    (B-3)

    The right hand side of this inequality is a strictly concavefunction (downward parabola) with respect to ps0. Note thatsj=0 n(s, j) = 1. The parabola will achieve the maximum

    value at its vertex given by

    ps0 =0sj=0

    (K

    T g0,SBSj +N0)n(s, j)

    g0. (B-4)

    If ps0 is not available in P , since the right hand side of theinequality above is a parabola w.r.t. ps0, the best response p

    s0

    to n(s) is the one nearest to the vertex ps0 .On the other hand, given strategy m of the MBS, the

    problem of finding the best response strategy n for the CEQ issimplified into a simple MDP in (11). Then, there always existsa pure stationary strategy n [17, Chapter 2]. This completesthe proof.

    APPENDIX CPROOF OF LEMMA 2

    Using from the stochastic differential equation in (18) attime t, dE(t) = p(t, E(t))dt + dWt, we get the integralform as follows:

    E(t+ t) E(t) = t+tt

    p(v,E(v))dv +

    t+tt

    dWv

    = p(t, E(t))t + (Wt+t Wt),(C-1)

    where t (t, t + t). We obtain the second equality usingthe mean value theorem for integrals: If G(x) is a continuousfunction and f(x) is integrable function that does not changesign on the interval [a, b], then there exists x [a, b] such that baG(t)f(t)dt = G(x)

    baf(t)dt. Since equation (C-1) is true

    for all SBSs, taking expectation of this equality above for allSBSs (or all possible values of E), we have

    E[E(t+t)]E[E(t)] = E[p(t, E(t))]t+E[Wt+tWt]

    0

    Em(t+ t, E)dE

    0

    Em(t, E)dE = tE[p(t, E(t))],

    (C-2)

    where m(t, E) is the distribution of E in the system at timeinstant t. Using the fact that W is a Wiener process, Wt+t

  • Wt follows a normal distribution with mean zero, we haveE[Wt+t Wt] = 0.

    By dividing both sides by t and letting t to be very small(or t dt), we have t t andd0

    Em(t, E)dE

    dt=

    0

    p(t, E)m(t, E)dE

    = p(t).(C-3)

    Using m(t, R) = m(t, E), dE = eRdR, and changing thevariable E to R we complete the proof.

    REFERENCES[1] K. T. Tran, H. Tabassum, and E. Hossain, A stochastic power control

    game for two-tier cellular networks with energy harvesting small cells,Proc. IEEE Globecom14, Austin, TX, USA, 8-12 December, 2014.

    [2] M. Deruyck, D. D. Vulter, W. Joseph, and L. Martens, Modellingthe power consumption in femtocell networks, IEEE WCNC 2012Workshop on Future Green Communications.

    [3] B. Gurakan, O. Ozel, J. Yang, and S. Ukulus, Energy cooperation inenergy harvesting communication, IEEE Transactions on Communica-tions, vol. 61, no. 12, Dec. 2013, pp. 48844898.

    [4] Z. Ding, S. Perlaza, I. Esnaola, and H. Poor, Power allocation strategiesin energy harvesting wireless cooperative networks, IEEE Transactionson Wireless Communications, Jan. 2014, pp. 846860.

    [5] N. Michelusi, K. Stamatiou, and M. Zorzi, Transmission policies forenergy harvesting sensors with time-correlated energy supply, IEEETransactions on Communications, vol. 61, no. 7, Jul. 2013, pp. 29883001.

    [6] H. Dhillon, Y. Li, P. Nuggehalli, Z. Pi, and J. Andrews, Funda-mentals of heterogeneous cellular networks with energy harvesting,www.arxiv.org/pdf/1307.1524, Jan. 2014.

    [7] D. Gunduz, K. Stamatiou, M. Zorzi, Designing intelligent energyharvesting communication systems, IEEE Communications Magazine,vol. 52, no. 1, Jan. 2014, pp. 210216

    [8] P. Semasinghe and E. Hossain, Downlink power control in self-organizing dense small cells underlaying macrocells: A mean fieldgame, IEEE Transactions on Mobile Computing, to appear.

    [9] K. Huang and V. K. Lau, Enabling wireless power transfer in cellularnetworks: Architecture, modeling and deployment, IEEE Transactionson Wireless Communications, vol. 13, no. 2, Feb. 2014, pp. 902912.

    [10] H. Tabassum, E. Hossain, A. Ogundipe, and D.I. Kim, Wireless-PoweredCellular Networks: Key Challenges and Solution Techniques, IEEECommunications Magazine, to appear.

    [11] C-RAN the road towards green RAN, White Paper, China MobileResearch Institure, October 2011.

    [12] P. Semasinghe, E. Hossain, and K. Zhu, An evolutionary game fordistributed resource allocation in self-organizing small cells, IEEETransactions on Mobile Computing, DOI 10.1109/TMC.2014.2318700.

    [13] M. Bowling and M. Veloso, An analysis of stochastic game theory formultiagent reinforcement learning, Technical Report, Carnegie MellonUniversity.

    [14] J. Hu and M. P. Wellman, Nash Q-learning for general-sum stochasticgames, Journal of Machine Learning Research, vol. 4, Nov. 2003, pp.10391069.

    [15] J. A. Filar, Quadratic programming and the single-controller stochasticgames, Journal of Mathematical Analysis and Applications, vol. 113,no. 1, Jan. 1986, pp. 136147.

    [16] V. Chandrasekhar and J. Andrews, Power control in two-tier femtocellnetworks, IEEE Transactions on Wireless Communications, vol. 8, no.8, Aug. 2009, pp. 43164328.

    [17] J. Filar and K. Vrieze, Competitive Markov Decision Process. Springer1997.

    [18] O. Gueant, J. M. Lasry, and P. L. Lions, Mean field games and appli-cations, Paris-Princeton Lectures on Mathematical Finance, Springer,2011, pp. 205266.

    [19] A. Y. Al-Zahrani, R. Yu, and M. Huang, A joint cross-layer and co-layer interference management scheme in hyper-dense heterogeneousnetworks using mean-field game theory, IEEE Transaction of VehicularTechnology, to appear, Oct. 2014.

    [20] H. Tembine, R. Tempone, and P. Vilanova, Mean field games forcognitive radio networks, in Proc. of American Control Conference,June 2012.

    [21] MFG Labs,A mean field game approach to oil production,http://mfglabs.com/wp-content/uploads/2012/12/cfe.pdf accessed in 14-Nov-2014.

    [22] S. Guruacharya, D. Niyato, D. I. Kim, and E. Hossain, Hierarchicalcompetition for downlink power allocation in OFDMA femtocell net-works, IEEE Transactions on Wireless Communications, vol. 12, no. 4,Apr. 2013, pp. 15431553.

    [23] E. Altman et al., Constrained stochastic games in wireless networks,IEEE Globecom General Symposium, Washington D.C., 2007.

    [24] E. Altman et al., Dynamic discrete power control in cellular networks,IEEE Transactions on Automatic Control, vol. 54, no. 10, Oct. 2009, pp.23282340.

    [25] M. Huang, Member, P. E. Caines, and R. P. Malham, Uplink poweradjustment in wireless communication systems: A stochastic controlanalysis, IEEE Transactions on Automatic Control, vol. 49, no. 10,Oct. 2004, pp. 16931708.

    [26] T. Tao, Mean field games, http://terrytao.wordpress.com/2010/01/07/mean-field-equations/ accessed in 14-Nov-2014.

    [27] D. Bauso, H. Tembine, and T. Basar Robust mean field games withapplication to production of an exhaustible resource, Proc. of the 7thIFAC Symposium on Robust Control Design, Aalborg, 2012.

    [28] J. Li and Yi-Tung Chen, Computational Partial Differential EquationsUsing MATLAB, Chapman and Hall/CRC, 2008.

    [29] Y. Achdou, I. C. Dolcetta, Mean field games: Numerical methodsSIAM Journal on Numerical Analysis, vo. 48, no. 3, 2010, pp. 11361162.

    [30] W. A. Strauss, Partial differential equations : an introduction, 2ndedition, Wiley, 2007.