[ieee 2009 2nd ifip wireless days (wd) - paris, france (2009.12.15-2009.12.17)] 2009 2nd ifip...

6
Dynamic Bargaining Solutions for Opportunistic Spectrum Access Hamidou Tembine LIA/CERI, University of Avignon, France. Abstract—This paper studies dynamic bargaining solutions for opportunistic spectrum access in cognitive radio networks. We compare the bargaining solutions with global optimum and non- cooperative solution in a strategic setting. We examine the cost of bargaining and the benefit of bargaining in the stochastic bargaining opportunistic access game in which each user has its own state those transitions are described as Markov decision processes with local resource states. The states and actions of the users which sense the same channel determine the instantaneous payoffs. They also determine the transition probabilities to move to the next states. We characterize the dynamic bargaining outcomes in short term and in long-term. I. I NTRODUCTION Dynamic opportunistic spectrum access has become a promising approach to fully utilize the scarce spectrum re- sources. In a dynamically changing spectrum environment, it is very important to consider the statistics and experiences of different users’ spectrum access so as to achieve more efficient spectrum allocation. Cognitive radio offers a way of solving spectrum underutilization problems using opportunis- tic schemes. It does so by sensing the radio environment. The current emerging cognitive radio networks assumes that certain portions of the spectrum will be opened up for secondary users (followers, unlicensed users or those users licensed by the primary - licensee), which can autonomously and opportunis- tically share the spectrum once primary users (leaders) are not active. Heterogeneous wireless users with different constraints, states, information sets, utilities functions, delay tolerances, traffic characteristics, interference avoidance, knowledge and ability to adapt will need to coexist in the same band. Current solutions do not provide the resource state-dependent or co- ordination based mechanisms for resource management in the cognitive radio networks. Thus, to enable the proliferation of new applications such as voice over wireless IP and streaming multimedia over cognitive radio networks, wireless solutions for dynamic spectrum access will need to consider the system evolutions, as well as the heterogeneity of wireless users [8], [7]. Moreover, followers will need to possess adaptive and learning abilities to be able to strategically influence and adapt to the dynamic spectrum division. Using their knowledge about the pasts, their information set, the state of the resource (quality of service degree, availability etc) followers can compete based on their quality of service requirements as well as optimally adapt their cross-layer transmission strategies to the environment dynamics and time varying gathered resources in three phases : listening-sensing, reception of messages, making decisions based the current information set. The two first phases consist on sensing which identifies those subbands of the radio spectrum are unoccupied (spectrum holes) by the leaders (e.g. legally licensed users) and providing the means for making those subbands available for employment by unserviced followers by sending messages. Such dynamic and competitive solutions for spectrum access and protocol design lead to more efficient and fair wireless networks than current solutions, which require followers to blindly follow predetermined or non-adaptive protocol rules. In this paper, we study an opportunistic spectrum access problem using dynamic bargaining game theory that cov- ers time-dependent bargaining problems, long-term negotia- tions, and stochastic bargaining games. In contrast to non- cooperative solutions (such as Stackelberg solutions, Nash equilibria, Wardrop equilibria etc), bargaining solutions are known to be Pareto optimal (there is no other feasible alloca- tion that makes every user at least as well off and at least one user strictly better off). Strategic bargaining games deal with a situation of several decision makers (often called agents, users or players) where the objective for each one of the players may be a function of not only its own preference and decision but also of decisions of other players. The choice of a decision by any player is done so as to optimize its own individual payoff. Most of the studies in the literature consider static one-shot game framework to study dynamic spectrum access. The problem is that the static one-shot game formulation does that reflect the network evolution and players do not have opportunities to readapt their strategies. Dynamic games are more adapted to model the variability of the network (players, resources, environment etc). In this paper, we analyze dynamic Nash bargaining solutions for opportunistic spectrum access. Dynamic bargaining games allow to model sequential decision making by different players in a long-term interaction. They allow to model situations in which the parameters defining the games vary in time and players can adapt their strategies according the evolution of the environment. At any given time, each player takes a decision (also called an action) according to some strategy or policy which may depend on its past experiences, states and observations. A strategy of a player is a collection of history-dependent map that tell at each time the choice (which can be probabilistic) of that player. The vector of actions chosen by players at a given time (called also action profile) may determine not only the payoff for each player at that time; it can also determine the transition to next state. Each player is interested in optimizing some functions of all the rewards and costs at different time instants suggested to 978-1-4244-5661-1/09/$26.00 ©2009 IEEE

Upload: hamidou

Post on 29-Mar-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

Dynamic Bargaining Solutions for OpportunisticSpectrum Access

Hamidou TembineLIA/CERI, University of Avignon, France.

Abstract—This paper studies dynamic bargaining solutions foropportunistic spectrum access in cognitive radio networks. Wecompare the bargaining solutions with global optimum and non-cooperative solution in a strategic setting. We examine the costof bargaining and the benefit of bargaining in the stochasticbargaining opportunistic access game in which each user hasits own state those transitions are described as Markov decisionprocesses with local resource states. The states and actions of theusers which sense the same channel determine the instantaneouspayoffs. They also determine the transition probabilities to moveto the next states. We characterize the dynamic bargainingoutcomes in short term and in long-term.

I. INTRODUCTION

Dynamic opportunistic spectrum access has become apromising approach to fully utilize the scarce spectrum re-sources. In a dynamically changing spectrum environment, itis very important to consider the statistics and experiencesof different users’ spectrum access so as to achieve moreefficient spectrum allocation. Cognitive radio offers a way ofsolving spectrum underutilization problems using opportunis-tic schemes. It does so by sensing the radio environment. Thecurrent emerging cognitive radio networks assumes that certainportions of the spectrum will be opened up for secondary users(followers, unlicensed users or those users licensed by theprimary - licensee), which can autonomously and opportunis-tically share the spectrum once primary users (leaders) are notactive. Heterogeneous wireless users with different constraints,states, information sets, utilities functions, delay tolerances,traffic characteristics, interference avoidance, knowledge andability to adapt will need to coexist in the same band. Currentsolutions do not provide the resource state-dependent or co-ordination based mechanisms for resource management in thecognitive radio networks. Thus, to enable the proliferation ofnew applications such as voice over wireless IP and streamingmultimedia over cognitive radio networks, wireless solutionsfor dynamic spectrum access will need to consider the systemevolutions, as well as the heterogeneity of wireless users [8],[7]. Moreover, followers will need to possess adaptive andlearning abilities to be able to strategically influence andadapt to the dynamic spectrum division. Using their knowledgeabout the pasts, their information set, the state of the resource(quality of service degree, availability etc) followers cancompete based on their quality of service requirements as wellas optimally adapt their cross-layer transmission strategies tothe environment dynamics and time varying gathered resourcesin three phases : listening-sensing, reception of messages,making decisions based the current information set. The two

first phases consist on sensing which identifies those subbandsof the radio spectrum are unoccupied (spectrum holes) bythe leaders (e.g. legally licensed users) and providing themeans for making those subbands available for employmentby unserviced followers by sending messages. Such dynamicand competitive solutions for spectrum access and protocoldesign lead to more efficient and fair wireless networks thancurrent solutions, which require followers to blindly followpredetermined or non-adaptive protocol rules.

In this paper, we study an opportunistic spectrum accessproblem using dynamic bargaining game theory that cov-ers time-dependent bargaining problems, long-term negotia-tions, and stochastic bargaining games. In contrast to non-cooperative solutions (such as Stackelberg solutions, Nashequilibria, Wardrop equilibria etc), bargaining solutions areknown to be Pareto optimal (there is no other feasible alloca-tion that makes every user at least as well off and at least oneuser strictly better off). Strategic bargaining games deal with asituation of several decision makers (often called agents, usersor players) where the objective for each one of the players maybe a function of not only its own preference and decision butalso of decisions of other players. The choice of a decisionby any player is done so as to optimize its own individualpayoff. Most of the studies in the literature consider staticone-shot game framework to study dynamic spectrum access.The problem is that the static one-shot game formulation doesthat reflect the network evolution and players do not haveopportunities to readapt their strategies. Dynamic games aremore adapted to model the variability of the network (players,resources, environment etc). In this paper, we analyze dynamicNash bargaining solutions for opportunistic spectrum access.Dynamic bargaining games allow to model sequential decisionmaking by different players in a long-term interaction. Theyallow to model situations in which the parameters definingthe games vary in time and players can adapt their strategiesaccording the evolution of the environment. At any given time,each player takes a decision (also called an action) accordingto some strategy or policy which may depend on its pastexperiences, states and observations. A strategy of a player isa collection of history-dependent map that tell at each time thechoice (which can be probabilistic) of that player. The vectorof actions chosen by players at a given time (called also actionprofile) may determine not only the payoff for each player atthat time; it can also determine the transition to next state.Each player is interested in optimizing some functions of allthe rewards and costs at different time instants suggested to

978-1-4244-5661-1/09/$26.00 ©2009 IEEE

Page 2: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

a set of (possibly coupled) constraints by using bargainingschemes.

A. Contribution of this paper

We model dynamic opportunistic spectrum access asstochastic bargaining game. Our model includes the non-saturated case which state represented by a specific state ofthe individual chain, the availability and the channel charac-teristics of each player, the state of the local resources. Weexamine stochastic bargaining games with (i) local resourcestates, (ii) finitely many individual states for each player whichrepresents its buffer size and its channel characteristics (iii)expected energy constraints with different discount factors,(iv) a finite set of individual signal space (messages). Weshow existence of equilibria and bargaining solution such instochastic games. We apply them to dynamic opportunisticspectrum access (DOSA) in cognitive radio networks.

B. Related work

In recent years, there have been an extensive literature ongame theory for networks. Most of the study focus on staticone-shot games. To capture the network evolution [10], [9],[7], and to allow adaptive or learning-based strategies, dynamicgame models are more appropriate. Atomic unconstrainedstochastic games with individual states formulation have beenproposed by Haykin (2005) [4] to analyze dynamic spectrumsharing in cognitive radio networks in which the state ofthe resource and messages are not taken in consideration.Atomic stochastic with individual states have been studiedin [1] in which the states of the resource is not considered(there is a single resource state) and there is no signals ormessages. The authors in [1] assume that the individual stateprocesses are independent, and establish existence of equilibriain stationary strategies in the time average (Cesaro-limit)payoff under strong Slater condition and imperfect observationon the individual states. The authors in [3] consider dynamicrandom access game with a finite number of opportunities fortransmission and with energy constraints. They show that find-ing Nash-Pareto policies of the dynamic random access gameis equivalent to partitioning the set of time slot opportunitieswith constraints into a set of mobile terminals. In the case ofnon-integer energy constraint, they proved that Time DivisionMultiplexing (TDM) policies can be suboptimal and also thedynamic random access game has several strong equilibria(resilient to any coalition of any size), and we compute themexplicitly. In this case of finite horizon energy constraineddynamic random access, it is known that, the strong price ofanarchy which measures the gap between the payoff understrong equilibria and the social optimum is one.

Flesch, Schoenmakers, and Vrieze (2007,2008) have studiedin [5] a class of unconstrained atomic stochastic games withtime average payoff and independent states called productgames (every player j can play on the j-th coordinate ofthe product-game without interference of the other players).The authors establish the existence of equilibria in time-dependent strategies (in both periodic and aperiodic case).

In addition, for the special case of two-player zero-sumgames of this type, they have showed that both players havestationary optimal strategies. In particular, the existence ofvalue in two-player zero-sum stochastic games where eachplayer controls its transition probabilities is shown. Note thatin our model, the players do not necessarily control the stateof the resource i.e the state processes are not independent, andthe actions available to each player are (resource, individual)-state-dependent and correlated by signals. Our model includesin some sense discounted stochastic games with product statespace, stochastic games with single controller, and repeatedgames with private signals. In order to further improve theperformance of (Nash) equilibria, it is natural to extend theframework of non-cooperative games to cooperative games. Inthis paper, we take into consideration the concept of the Nashbargaining solution in dynamic game, because it provides afair operation point for decentralized communication systems.The Nash bargaining solution is a standard tool in cooperativegame theory, and is applied widely in network resourceallocation. The Nash bargaining solution concept generalizesthe proportional fairness and increases the efficiency of thesystem.

C. Structure

The remainder of the paper is organized as follows. In nextsection we formulate the opportunistic spectrum access prob-lem. In section III we present static bargaining formulation andcompare with different solution concepts. We then focus onstochastic bargaining games with individual and local resourcestates. Section V concludes the paper.

II. THE SETTING

Consider a spectrum consisting of m wireless channels(called local resources). These channels are licensed to primaryusers which has access when they need. There is a secondarynetwork consisting of n secondary users (followers) whichcan opportunistically share the local resources when primaryusers are not using them. When a local resource is availablefor secondary users, the secondary users which sense the sameresource negotiate between them to access to resource (simul-taneous transmissions at the same time slot lead to collision).Each user has an energy consumption cost. We describe theproblem as a dynamic bargaining game in cognitive radionetwork with n followers which compete dynamically for theavailable resources. The competition for the dynamic resourcewith several states is assisted by a virtual coordinator whichplay to the role of virtual arbitrator (the central spectrummoderator can play this role similar to that in existing wirelessLAN standards such as 802.11e 2007 Hybrid CoordinationFunction - HCF -). The role of the virtual arbitrator is toallocate resources and send signals (messages) to the followersbased on the current states of the resource, the past actions andutility maximization rule. Here the role of the virtual arbitratoris played by the receiver. In order to capture the networkevolution, we allow the virtual arbitrator to repeatedly sendsignals and the available spectrum opportunities based on

Page 3: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

the leaders behaviors. The followers can opportunisticallyutilize the network resources that are vacated by the leaders.Given the signals (messages), the state of the resource, eachfollower is allowed to strategically adapt its strategy basedon information about the available spectrum opportunities,its source and channel characteristics, and the impact ofthe other followers actions. In addition each follower has aset of long-term constraint on its energy consumption. LetS = {0, 1, 2, . . . , |S|} the states of the resource. The state ”0”means that the state of the resource is ”bad” or the resource iscompletely unavailable for followers (it is used by leaders or itis off). The other states of the resource means that the resourceis available for followers with some quality ”s”. Higher is ”s”more the quality is. Each follower has its buffer and channelcharacteristic modeled as controlled stochastic process. Thestate of j is then in the form xj = (xj1, x

j2) ∈ Xj where xj1

encapsulates the current buffer state and xj2 the state of herchannel. Let p0 = 0 < p1 < p2 < . . . < pl be the powerlevels and denote by P := {p0, p1, . . . , pl}. If a followerj has no packet to send at a given slot or its channel isbad then its state corresponds to 0 (the analysis covers alsonon-saturated transmission scenarios). Aj(s, (x

j1, 0)) = {p0}.

When the resource state 0 there is no action or equivalently theaction set is reduced to Aj(0, xj) = {p0}. The other actionsare given by

s 6= 0, Aj(s, xj) = {p0, p1, . . . , pl}.

The set of messages is

M j = {”good”, ”medium”, ”bad”, . . .}Let define the instantaneous reward r(s,m, x, a) as a func-

tion of probability to have a successful transmission minus thecost of energy consumption. We consider that a transmissionis successful for follower j if the state of the resource iss 6= 0, the second component of this state is non-zero andits transmit power aj is strictly greater than the powers usedby the other followers which sense the same channel as j. Thecost of energy consumption of follower j is a non-decreasingfunction cj : P −→ R+. The instantaneous payoff is thenuj(s,m, x, a)

= −cj(aj) +

{1 if s 6= 0, xj 6= 0, aj > maxi 6=j ai

0 otherwise

If 1 ≤ cj(aj) for some aj 6= p0 then it is better of thefollower j to stay quiet than the transmit (the action aj isweakly dominated by p0). We therefore consider the casewhere 1 > maxj maxaj cj(aj).

III. STATIC NASH BARGAINING SOLUTIONS

Consider n secondary users. An action of secondary user jis a pair (rj , xj). User j senses a certain channel rj amongthe m channels. If the channel r is idle, all the secondaryusers which sense the same channel rj = r can bargain for theaccess. User j can use a secondary strategy xj . Its payoff uj isthen its probability of success minus the cost

∑k c

j(ak)xj(ak)plus fix term R. If k users sense the same channel r, theasymmetric Nash bargaining problem reads

max(a1,...,ak)∈Pk

∏j=1,...,k

(gj(uj(a))− gj∗)θj

g is an injective function. The coefficient θj are called ”bar-gaining powers”. We normalize the {θj}j such that

θj ≥ 0,∑j

θj = 1.

uj(aj , x−j) = R+∑

a−j∈Pk−1

1l{aj>maxl 6=j al}∏l 6=j

xl(al)−cj(aj),

uj(xj , x−j) =∑aj∈P

uj(aj , x−j)xj(aj).

We define different efficiency metrics for bargaining solu-tions:• Benefit of Bargaining (BoB) measures the gap between

bargaining solutions and non-cooperative solutions.• Efficiency of Bargaining: to measure the ratio between

the bargaining solution payoff and global optimum (op-timum social welfare).

• Cost of Bargaining (CoB) measures the cost of com-munications, cost of exchanged messages to obtain abargaining outcome.

A. Example: two strategies

Consider two users, without pricing and without a fix payoffR, the Nash bargaining problem reads

max0≤x,y≤1

[x(1− y)]θ[y(1− x)]1−θ.

The set of outcome is given by

{(x(1− y), y(1− x)) | x ∈ [0, 1], y ∈ [0, 1]}.

The main difficulty in solving the bargaining problem is thenon-convexity of this set. To see the non-convexity, considerthe cases: (i) user 1 transmits and player stay quiet, (ii) user 2transmits and user 1 stay quiet. At the first configuration theoutcome is (1, 0) and at the second configuration the outcome(0, 1). Now is the question is : is there any strategy profile(x, y) ∈ [0, 1]2 such that x(1 − y) = 1

2 and y(1 − x) = 12?

The answer is negative: if the two equalities are satisfied, thenby taking the difference one get x = y and

x(1− y) = y(1− x) = x(1− x) ≤ 14<

12,

contradiction. The solution of our Nash bargaining is given by(θ, 1 − θ), (1 − θ, θ). The probability of success at the Nashbargaining solution is u1

∗ = θ2 for user 1 and u2∗ = (1 − θ)2

for user 2. Notice that u1∗ goes to 1 if the bargaining power

θ of user 1 goes to 1, u1∗ −→ 0 if θ −→ 0 and u1

∗ −→ 14

in the case of equal bargaining power between the two users.Notice that the total sum at bargaining solution is θ2 +(1−θ)2

which goes to global optimal 1 when θ goes to 1 or 0. Thissays that the efficiency of bargaining is one if there is no costof communications, no cost for exchanged messages to obtaina bargaining outcome.

Page 4: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Bargaining power of player one

Total Nash bargaining payoff

Fig. 1. Total payoff at Nash bargaining solution vs power bargaining θ. Theminimum payoff is at θ = 1/2.

Since the worst equilibrium payoff is zero (collision case),the BoB is then BoB = θ2+(1−θ)2−0 and the inefficiency ofbargaining is 1−(θ2+(1−θ)2). Thus, if the cost of bargaining(CoB) is less than θ2 +(1−θ)2, one can significantly improvethe probability of success by adopting bargaining solutions.The worse case of bargaining solution is (see figure 1) is 0.5.Moreover Nash bargaining is known to be a generalization ofproportional fairness.

With pricing, the strategic Nash bargaining problem (seeFig. 2) is given by R ≥ c1 = c(p1),

max0≤x,y≤1

[R+ x(1− y)− c1x]θ[R+ y(1− x)− c1y]1−θ.

The solution is given (x∗, y∗) where y∗ = 1− c1−x∗ and x∗is solution of

R+ (1− c1 − x)2

R+ x2=

(1− θ)(1− c1 − x)θx

By developing and collecting this equation, one get a polyno-mial with degree three which has a unique real root and twocomplex roots. Note that if θ = 1

2 , the solution is ( 1−c12 , 1−c1

2 ).

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Payoff

Strategy of player oneStrategy of player two

Fig. 2. Contour: Payoffs for c1 = 0.2, R = 0.2.

B. Bargaining between k users

More generally with k secondary users, the Nash bargainingproblem without pricing

max(x1,...,xk)∈∆(P)k

k∏j=1

xj∏l 6=j

(1− xl)

θj

can be rewritten into decomposable maximization problem

max(x1,...,xk)∈∆(P)k

k∏j=1

((xj)θ

j

(1− xj)1−θj)

Again the solution is given by xj = θj . For the non-normalized case with weight wj > 0, j = 1, . . . , n, thesolution is (

w1∑j w

j,

w2∑j w

j, . . . ,

wn∑j w

j

),

and the probability of success of user j is

wj∑k w

k

∏i 6=j

(1− wi∑

k wk

)= θj

∏i 6=j

(1− θi).

The figure 3 represents the outcome for three users.

Fig. 3. Payoffs for c1 = 1/4, R = 1/4.

C. Random number of secondary users

We examine the case of symmetric strategies. The expectedpayoff of a secondary user j with the power level pi whenfacing k − 1 others mobiles is given by

f jk,pi(x) = ujk(x, . . . , x, pi, x, . . . , x) (1)

We can easily compute the expected payoff for k interactingusers: f jm,p0(x) = 0, and for i ≥ 1,

f jm,pi(x) = R− cj(pi) + (xp0 + xp1 + xp2 + . . .+ xpi−1)k−1

Denote by M the random variable representing the numberof mobiles which sense the same channel as some anonymousmobile randomly selected in the secondary network and byGM (s) = EM (sM ) the generating function of the randomvariable M. The expected payoff of a secondary user usingthe power level pi can be expressed as

F jpi(x) = R+GM (xp0 + xp1 + xp2 + . . .+ xpi−1)− cj(pi)

for all i > 0 and F jp0(x) = 0. In the non-cooperativesetting, the interior Nash equilibrium which gives the worseequilibrium payoff is given by the following result:

Assume that cj(.) ≡ c(.). The spectrum access game withrandom number of interacting users around each receiver hasa unique strictly mixed Nash equilibrium given by

xp0 = G−1M (c(p1)) ,

Page 5: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

For 1 ≤ j < l,

xpj = G−1M (c(pj+1))−G−1

M (c(pj)) ,

xpl = 1−l−1∑j=1

xpj − xp0 = 1−G−1M (c(pl)) ,

under the condition that P (M = 0) < c(p1). In addition thepayoff at this equilibrium is zero.

D. Long-term Nash bargaining

1) Bargaining with the long-term payoff : We assumethat each player is able to know the state of one of thechannel by sensing it when its individual state is non-zero.Consider n secondary users are interacting for many times toopportunistically access to the spectrum. Time is slotted: N.At each time t, each user j bargain with the others active usersby selecting a power level aj with some probability xj(aj).If an agreement occurs then user j receives uj(aj , a−j). if adisagreement outcome occurs, and each player j receives 0(the non-cooperative outcome). There is a discount factor δjbetween periods. And go to next slot t+1. The average payoffis

ujδ(σ) = (1− δj)Eσ∞∑t=1

δt−1j uj(ajt , a

−jt ).

The users bargain for the expected discounted payoff. Thedynamic Nash bargaining problem reads

(∗) maxσ,

∀j, ujδ≥uj

δ,∗

∏i

(ujδ(σ)− ujδ,∗)θj

where uj∗ is the worse discounted payoff at non-cooperativesolution.

2) Bargaining at each time slot: If the secondary usersbargain at each time slot, the payoff

(1− δj)∑t≥t0

δt−t0j θj∏i 6=j

(1− θi) = θj∏i 6=j

(1− θi)

is guaranteed by j in the long-term. Note that the payoffobtained at a solution of (*) is greater than θj

∏i 6=j(1− θi).

In next section, we present a stochastic bargaining gameformulation which includes the previous problems.

IV. STOCHASTIC BARGAINING GAMES WITH INDIVIDUALAND LOCAL RESOURCE STATES

In this section we formulate the stochastic bargaining gamefor opportunistic spectrum access. The stochastic game isdescribed by Γn =

(N , S,X1,M1, . . . , Xn,Mn, A1(s, x1), . . . , An(s, xn), (uj)j∈N ),

where• There are n players. Denote by N = {1, 2, . . . , n} the

set of players,• S is a finite set of states of the resource,• For each player j,

– Signals: there is a finite set M j of messages,– Individual states: Xj : set of states of player j,

– Actions : Aj(s, xj) is a set of actions available toplayer j if its state is xj and the state of the resourceis s,

– Instant Payoff:

uj : SAM −→ R,

where

SAM = {(s, x1,m1, a1, . . . , xn,mn, an) |s ∈ S, xj ∈ Xj ,mj ∈Mj , aj ∈ Aj(s, xj)}

SM :={

(s, x1,m1, . . . , xn,mn) |

s ∈ S, xj ∈ Xj ,mj ∈M j}.

– Discount factor: there is a discount factor δj ∈ (0, 1)for each player j.

• A initial state profile ηt0 ∈ S ×∏j X

j is given.• Transition between state profile : At every stage t, the

system moves to a new state according to a transitionfunction

qt :

S ×∏j

Xj

× SAM t−t0 −→ ∆(SM),

where the joint strategy is chosen according to thebargaining outcome. That is, given the history, the na-ture chooses the next states and the message profiles:from the current profile s, x1,m1, a1, . . . , xn,mn, an tos′, x′

1,m′

1, . . . , x′

n,m′

n.

A. Description of the stochastic bargaining game:The stochastic bargaining game is played as follows: The

system starts at the initial state ηt0 . At each time slot t, eachplayer j knows its own state and chooses an action ajt . Wedenote by at = (a1

t , . . . , ant ). Nature chooses

(ηt+1,mt) := (st+1, x1t+1,m

1t , . . . , x

nt+1,m

nt ) ∈ SM

according to the transition q(.|ht) where ht =(ηt0 , at0 ,mt0 , ηt0+1, at0+1,mt0+1, . . . , ηt−1, at−1,mt−1, ηt).Each player j knows its own state message mj

t but he/shedoes not observe the current state, actions of the othersplayers. The message may contain some information about itspayoff. The collection of information available to player j atthe beginning of the time slot t is the collection of messagesreceived by j : mj

t0 , . . . ,mjt−1, and the sequence of actions

played by j : ajt0 , . . . , ajt−1. We then construct the set of

histories of player j as⋃t≥t0 H

jt where

Hjt = (S ×Xj ×Aj ×M j)t−t0 .

This means that the private history of the players differ. Abehavior strategy σj of player j is a map from

⋃t≥t0 H

jt

to the set of randomized actions ∆(Aj) on the current state.Denote by Σj the set of behavioral strategies of player j.The collection of behavioral strategies σ = (σ1, σ2, . . . , σn)is called a strategy profile or multi-strategy. A pure strategyof player j is a map from the set of histories to

⋃t≥t0 H

jt to

the set of actions of player j, Aj .

Page 6: [IEEE 2009 2nd IFIP Wireless Days (WD) - Paris, France (2009.12.15-2009.12.17)] 2009 2nd IFIP Wireless Days (WD) - Dynamic bargaining solutions for opportunistic spectrum access

B. Transformation to resource-state dependent product game

One can map our stochastic bargaining game into standardbargaining stochastic games with resource state dependentproduct-space:• states: Φ = S ×X1 ×M1 × . . .×Xn ×Mn,• actions of player j in state φ ∈ Φ, Aj(φ) := Aj(s, xj)• transition from φ to φ′ :

qt,φaφ′ = P (φt+1 = φ′|φt = φ, at = a,mt = m,ht−1),

a ∈∏j

Aj(φ).

Note that in this new formulation a player does not the currentproduct state since it is coupled with the states of the othersplayers. This partial information structure on the stochasticbargaining game makes the analysis difficult.

C. Discounted payoffs:

A initial state profile ηt0 and a behaviorial strategy σinduced a probability measure Pηt0 ,σ on the set of the infinitehistories of the game, endowed with the product σ-algebra. Wedenote by Eηt0 ,σ the expectation under Pηt0 ,σ. The discountedexpected payoff of player j under the behavior strategyσ = (σt)t and the initial state ηt0 at time slot t0 is givenby uj,δ(ηt0 , σ)

= (1−δj)Eηt0 ,σ

∑t≥t0

δt−t0j uj(st, x1t ,m

1t , a

1t , . . . , x

nt ,m

nt , a

nt )

.

which can be rewritten as

(1−δj)uj(ηt0 , a0)+δj∑

η′∈SMqt0(ηt0 , at0 , η

′)U j,δ(η′, σ|ηt0 , at0)

where U j,δ(.|.) is the continuation payoff. Using the previoussections, the following holds:

If the chain is ergodic under any stationary strategies andlet π be the stationary distribution of q(.|.). Then user j canguaranteed the payoff

(θj∏i6=j

(1− θi)− c(p1)θj)(1− π0,0)

in the dynamic bargaining game. Moreover there exists abargaining solution in stationary strategy. Moreover, if eachindividual MDP is absorbing to state 0, and each resourceMDP chain is irreducible under any stationary strategy thenthe bargaining payoff of any user j is at least

(θj∏i 6=j

(1− θi)− c(p1)θj)(1− δτ−t0+1)

where τ is the hitting time to state 0.

V. CONCLUDING REMARKS

We have formulated dynamic bargaining solutions for op-portunistic spectrum and compared with non-cooperative so-lutions and global optimum. An interesting extension whichwe leave for future work is the mean field asymptotics [6] ofstochastic bargaining games.

ACKNOWLEDGMENTS

The author would like to thank Piotr Wiecek for helpfuldiscussions on ”dynamic bargaining games”.

REFERENCES

[1] E. Altman, K. Avrachenkov, N. Bonneau, M. Debbah, R. El-Azouzi andD. Sadoc Menasche, Constrained Cost-Coupled Stochastic Games withIndependent State Processes, Operations Research Letters, Vol 36, pp160-164, 2008.

[2] F. Fu and M. van der Schaar, ”Stochastic game formulation for coginitiveradio networks,” in Proc. IEEE Dyspan 2008.

[3] E. Altman, T. Basar, I. Menache and H. Tembine, A Dynamic RandomAccess Game with Energy Constraints, in IEEE/ACM Proc of 7thInternational Symposium on Modeling and Optimization in Mobile, AdHoc, and Wireless Networks (WIOPT), South Korea, June 2009.

[4] S. Haykin, Cognitive Radio: Brain-Empowered Wireless Communica-tions, IEEE JSAC, vol. 23, no. 2, pp. 201-220, Feb. 2005.

[5] Flesch J., Schoenmakers G., Vrieze K., Stochastic Games on a ProductState Space, 2008, METEOR 016, Maastricht Research School ofEconomics of Technology.

[6] Tembine H., Le Boudec J. Y., ElAzouzi R., Altman E. , Mean fieldasymptotics of Markov Decision Evolutionary Games , In Proc. of IEEEGamenets, May 2009.

[7] H. Tembine, Population games with networking applications, PhDdissertation, University of Avignon, September 2009.

[8] H. Tembine, Evolutionary Networking Games, Book chapter in GameTheory for Wireless Communications and Networking, Auerbach Pub-lications, Taylor and Francis Group, CRC Press, to appear, 2009.

[9] H. Tembine, E. Altman, R. ElAzouzi, Y. Hayel, Evolutionary Gamesin Wireless Networks, to appear in IEEE Trans. on Systems, Man andCybernetics, Special Issue : Game Theory, 2009.

[10] Q. Zhu, H. Tembine and T. Basar, Network Security Configuration: ANonzero-sum Stochastic Game Approach. Submitted 2009.