[acm press the thirteenth acm international symposium - hilton head, south carolina, usa...

10
Spatial Spectrum Access Game: Nash Equilibria and Distributed Learning Xu Chen Department of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong [email protected] Jianwei Huang Department of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong [email protected] ABSTRACT A key feature of wireless communications is the spatial reuse. However, the spatial aspect is not yet well understood for the purpose of designing efficient spectrum sharing mecha- nisms. In this paper, we propose a framework of spatial spec- trum access games on directed interference graphs, which can model quite general interference relationship with spa- tial reuse in wireless networks. We show that a pure strategy equilibrium exists for the two classes of games: (1) any spa- tial spectrum access games on directed acyclic graphs, and (2) any games satisfying the congestion property on directed trees and directed forests. Under mild technical conditions, the spatial spectrum access games with random backoff and Aloha channel contention mechanisms on undirected graphs also have a pure Nash equilibrium. We then propose a dis- tributed learning algorithm, which only utilizes users’ lo- cal observations to adaptively adjust the spectrum access strategies. We show that the distributed learning algorithm can converge to an approximate mixed-strategy Nash equi- librium for any spatial spectrum access games. Numerical results demonstrate that the distributed learning algorithm achieves up to 100% performance improvement over a ran- dom access algorithm. Categories and Subject Descriptors C.2.1 [Network Architecture and Design]: Wireless com- munication Keywords Cognitive Radio, Distributed Spectrum Sharing, Nash Equi- librium, Distributed Learning 1. INTRODUCTION Cognitive radio is envisioned as a promising technique to alleviate the problem of spectrum under-utilization [1]. It enables unlicensed wireless users (secondary users) to op- portunistically access the licensed channels owned by legacy Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MobiHoc’12, June 11–14, 2012, Hilton Head Island, SC, USA. Copyright 2012 ACM 978-1-4503-1281-3/12/06 ...$10.00. spectrum holders (primary users), and thus can significantly improve the spectrum efficiency [1]. A key challenge of the cognitive radio technology is how to resolve the resource competition by selfish secondary users in a decentralized fashion. If multiple secondary users transmit over the same channel simultaneously, severe interferences or collisions might occur and the data rates of all users may get reduced. Therefore, it is necessary to design efficient spectrum sharing mechanism for cognitive radio networks. The competitions among secondary users for common spec- trum have often been studied as a noncooperative game the- ory (e.g., [6, 9, 18, 19, 23]). Nie and Comniciu in [18] de- signed a self-enforcing distributed spectrum access mecha- nism based on potential games. Niyato and Hossain in [19] studied a price-based spectrum access mechanism for com- petitive secondary users. Chen and Huang in [6] investigated stable spectrum sharing mechanism design based on evolu- tionary game theory. F ´ l elegyh ´ czi et al. in [9] proposed a two-tier game framework for medium access control (MAC) mechanism design. When not knowing spectrum information such as chan- nel availability, secondary users need to learn the network environment and adapt the spectrum access decisions ac- cordingly. Han et al. in [11] used no-regret learning to solve this problem, assuming that the users’ channel selections are common information. When users’ channel selections are not observable, authors in [2, 13] designed multi-agent multi-armed bandit learning algorithms to minimize the ex- pected performance loss of distributed spectrum access. A common assumption of the above results is that sec- ondary users are close-by and interfere with each other when they transmit on the same channel simultaneously. How- ever, a unique feature of wireless communication is spatial reuse. If users who transmit simultaneously are located suf- ficiently far away, then simultaneous transmissions over the same channel may not cause any performance degradation to any user (see Figure 1 for an illustration). Such spatial ef- fect on spectrum sharing is less understood than many other aspects in existing literature [24]. Recently, Tekin et al. in [22] and Southwell et al. in [21] proposed a novel spatial congestion game framework to take spatial relationship into account. The key idea is to extend the classical congestion game upon an undirected graph, by assuming that the interferences among the players are sym- metric and a player’s throughput depends on the number of players in its neighborhood that choose the same resource. As illustrated in Figure 1, however, the interference relation- ship among the secondary users can be asymmetric due to 205

Upload: jianwei

Post on 12-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

Spatial Spectrum Access Game: Nash Equilibria andDistributed Learning

Xu ChenDepartment of Information EngineeringThe Chinese University of Hong Kong

Shatin, Hong [email protected]

Jianwei HuangDepartment of Information EngineeringThe Chinese University of Hong Kong

Shatin, Hong [email protected]

ABSTRACTA key feature of wireless communications is the spatial reuse.However, the spatial aspect is not yet well understood forthe purpose of designing efficient spectrum sharing mecha-nisms. In this paper, we propose a framework of spatial spec-trum access games on directed interference graphs, whichcan model quite general interference relationship with spa-tial reuse in wireless networks. We show that a pure strategyequilibrium exists for the two classes of games: (1) any spa-tial spectrum access games on directed acyclic graphs, and(2) any games satisfying the congestion property on directedtrees and directed forests. Under mild technical conditions,the spatial spectrum access games with random backoff andAloha channel contention mechanisms on undirected graphsalso have a pure Nash equilibrium. We then propose a dis-tributed learning algorithm, which only utilizes users’ lo-cal observations to adaptively adjust the spectrum accessstrategies. We show that the distributed learning algorithmcan converge to an approximate mixed-strategy Nash equi-librium for any spatial spectrum access games. Numericalresults demonstrate that the distributed learning algorithmachieves up to 100% performance improvement over a ran-dom access algorithm.

Categories and Subject DescriptorsC.2.1 [Network Architecture and Design]: Wireless com-munication

KeywordsCognitive Radio, Distributed Spectrum Sharing, Nash Equi-librium, Distributed Learning

1. INTRODUCTIONCognitive radio is envisioned as a promising technique to

alleviate the problem of spectrum under-utilization [1]. Itenables unlicensed wireless users (secondary users) to op-portunistically access the licensed channels owned by legacy

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.MobiHoc’12, June 11–14, 2012, Hilton Head Island, SC, USA.Copyright 2012 ACM 978-1-4503-1281-3/12/06 ...$10.00.

spectrum holders (primary users), and thus can significantlyimprove the spectrum efficiency [1].A key challenge of the cognitive radio technology is how to

resolve the resource competition by selfish secondary users ina decentralized fashion. If multiple secondary users transmitover the same channel simultaneously, severe interferencesor collisions might occur and the data rates of all users mayget reduced. Therefore, it is necessary to design efficientspectrum sharing mechanism for cognitive radio networks.The competitions among secondary users for common spec-

trum have often been studied as a noncooperative game the-ory (e.g., [6, 9, 18, 19, 23]). Nie and Comniciu in [18] de-signed a self-enforcing distributed spectrum access mecha-nism based on potential games. Niyato and Hossain in [19]studied a price-based spectrum access mechanism for com-petitive secondary users. Chen and Huang in [6] investigatedstable spectrum sharing mechanism design based on evolu-tionary game theory. Fl ↪elegyhlczi et al. in [9] proposed atwo-tier game framework for medium access control (MAC)mechanism design.When not knowing spectrum information such as chan-

nel availability, secondary users need to learn the networkenvironment and adapt the spectrum access decisions ac-cordingly. Han et al. in [11] used no-regret learning to solvethis problem, assuming that the users’ channel selectionsare common information. When users’ channel selectionsare not observable, authors in [2, 13] designed multi-agentmulti-armed bandit learning algorithms to minimize the ex-pected performance loss of distributed spectrum access.A common assumption of the above results is that sec-

ondary users are close-by and interfere with each other whenthey transmit on the same channel simultaneously. How-ever, a unique feature of wireless communication is spatialreuse. If users who transmit simultaneously are located suf-ficiently far away, then simultaneous transmissions over thesame channel may not cause any performance degradationto any user (see Figure 1 for an illustration). Such spatial ef-fect on spectrum sharing is less understood than many otheraspects in existing literature [24].Recently, Tekin et al. in [22] and Southwell et al. in [21]

proposed a novel spatial congestion game framework to takespatial relationship into account. The key idea is to extendthe classical congestion game upon an undirected graph, byassuming that the interferences among the players are sym-metric and a player’s throughput depends on the number ofplayers in its neighborhood that choose the same resource.As illustrated in Figure 1, however, the interference relation-ship among the secondary users can be asymmetric due to

205

Page 2: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

����������� �� �

�������������� �

�������������� ���

�������������� ���

�������������� ���

���

���

���

���

���

���

�����������������

�������

�������

�������

Figure 1: Illustration of distributed spectrum accesswith spatial reuse under the protocol interferencemodel. Each user n is represented by a transmit-ter Txn and receiver Rxn pair. Users 2 and 3 cannot generate interference to user 1, since user 1’sreceiver Rx1 is far from user 2 and 3’s transmitters.On the other hand, user 1 can generate interferenceto user 2, since user 2’s receiver Rx2 is within thetransmission range of user 1’s transmitter Tx1. Sim-ilarly, user 2 and user 3 can generate interferencesto each other.

the heterogeneous transmission powers and locations of theusers. We hence propose a more general framework of spatialspectrum access game on directed interference graphs, whichtake users’ heterogeneous resource competition capabilitiesand asymmetric interference relationship into account. Thecongestion game on directed graphs has also been studiedin [4], with the assumption that players have linear and ho-mogeneous payoff functions. The game model in this paperis more generic and allows linear/nonlinear player-specificpayoff functions. Moreover, we design a distributed algo-rithm for achieving the equilibria of the game. The mainresults and contributions of this paper are as follows:

• General game formulation: We formulate the distributedspectrum access problem as a spatial spectrum ac-cess game on directed interference graphs, with user-specific channel data rates and channel contention ca-pabilities.

• Existence of Nash equilibria: We show by counterex-amples that a general spatial spectrum access gamemay not have a pure Nash equilibrium. We then showthat a pure strategy equilibrium exists for the follow-ing two classes of games: (1) any spatial spectrumaccess games on directed acyclic graphs, and (2) anygames satisfying the congestion property on directedtrees and directed forests. We also show that undermild conditions the spatial spectrum access games withrandom backoff and Aloha channel contention mecha-nisms on undirected graphs are potential games andhave pure Nash equilibria.

• Distributed learning for achieving approximate Nashequilibrium: We propose a distributed learning algo-rithm that can converge to an approximate mixed Nashequilibrium for any spatial spectrum access games byutilizing users’ local observations only. Numerical re-

sults demonstrate that the distributed learning algo-rithm achieves up-to 100% performance improvementover the random access algorithm.

The rest of the paper is organized as follows. We introducethe system model and the spatial spectrum access game inSections 2 and 3, respectively. We investigate the existenceof Nash equilibria in Section 4. Then we present the dis-tributed learning algorithm in Section 5. We illustrate theperformance of the proposed algorithm through numericalresults in Section 6, and finally conclude in Section 7.

2. SYSTEM MODELWe consider a cognitive radio network with a set M =

{1, 2, ...,M} of independent and stochastically heterogeneousprimary channels. A set N = {1, 2, ..., N} of secondary userstry to access these channels distributively when the chan-nels are not occupied by primary (licensed) transmissions.Here we assume that each secondary user is a dedicatedtransmitter-receiver pair.To take users’ spatial relationship into account, we denote

dn = (dTxn , dRxn) as the location vector of secondary usern, where dTxn and dRxn denote the location of the trans-mitter and the receiver, respectively. Each secondary user nhas a transmission range δn. Then given the location vec-tors of all secondary users, we can obtain the interferencegraph G = {N , E} to describe the interference relationshipamong the users (see Figure 1 for an example). Here the ver-tex set N is the same as the secondary user set. The edge setis defined as E = {(i, j) : ||dTxi , dRxj || ≤ δi,∀i, j �= i ∈ N},where ||dTxi , dRxj || is the distance between the transmitterof user i and the receiver of user j. As illustrated in Figure1, an interference edge can be directed or undirected. If aninterference edge is directed from secondary user i to userj, then user j’s data transmission will be affected by useri’s transmission on the same channel, but user i will not beaffected by user j. If the interference edge is undirected1 be-tween user i and user j, then the two users can affect eachother. Note that a generic directed interference graph canconsist of a mixture of directed and undirected edges. Inthe sequel, we call an interference graph undirected, if andonly if all the edges of the graph are undirected. We alsodenote the set of users that can cause interference to user nas Nn = {i : (i, n) ∈ E , i ∈ N}.Based on the interference model above, we describe the

cognitive radio network with a slotted transmission structureas follows:

• Channel State: the channel state for a channel m dur-ing time slot t is

Sm(t) =

⎧⎪⎨⎪⎩0, if channel m is occupied

by primary transmissions,

1, if channel m is idle.

• Channel State Transition: for a channelm, the channelstate Sm(t) is a random variable with a probabilitydensity function as ψm. In the following, we denotethe channel idle probability θm as the mean of Sm(t),i.e., θm = Eψm [Sm(t)].

1Here the edge is actually bi-directed. We follow the con-ventions in [22] and [21] and ignore the directions on theedge.

206

Page 3: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

• User Specific Channel Throughput : for each secondaryuser n, its realized data rate bnm(t) on an idle channelm in each time slot evolves according to a randomprocess with a mean Bnm, due to users’ heterogeneoustransmission technologies and the local environmentaleffects such as fading.

• Time Slot Structure: each secondary user n executesthe following stages synchronously during each timeslot:

– Channel Sensing : sense one of the channels basedon the channel selection decision made at the endof previous time slot.

– Channel Contention: Let an be the channel se-lected by user n, and a = (a1, ..., aN ) be the chan-nel selection profile of all users. The probabilitythat user n can grab the chosen idle channel anduring a time slot is gn(N an

n (a)) ∈ (0, 1), whichdepends on the subset of user n’s interfering usersthat choose the same channelN an

n (a) � {i ∈ Nn :ai = an}. Here are two examples:1) Random backoff mechanism: the contentionstage of a time slot is divided into λmax mini-slots (see Figure 2). Each contending user n firstcounts down according to a randomly and uni-formly generated integer backoff time counter (num-ber of mini-slots) λn between 1 and λmax. Ifthere is no active transmissions till the count-down timer expires, the user monitors the channeland transmits RTS/CTS messages on that chan-nel. If multiple users choose the same backoffcounter, a collision will occur and no users cangrab the channel successfully. Once successfullygets the channel, the user starts to transmit itsdata packet. In this case, we have

gn(N ann (a)) = Pr{λn < min

i∈Nn:ai=an{λi}}

=

λmax∑λ=1

Pr{λn = λ}

× Pr{λn < mini∈Nn:ai=an

{λi}|λn = λ}

=

λmax∑λ=1

1

λmax

(λmax − λλmax

)Knan

(a)

, (1)

where Knan(a) = |N an

n (a)| =∑i∈NnI{ai=an} de-

notes the number of user n’s interfering users choos-ing the same channel as user n.2) Aloha mechanism: user n contends for an idlechannel with a probability pn ∈ (0, 1) in a timeslot. If multiple interfering users contend for thesame channel, a collision occurs and no user cangrab the channel for data transmission. In thiscase, we have

gn(N ann (a)) = pn

∏i∈Nan

n (a)

(1− pi) . (2)

Note that for the random backoff mechanism, thechannel grabbing probability gn(N an

n (a)) is userhomogeneous since it only depends on the numberof contending users Kn

an(a). For the Aloha mech-anism, the channel grabbing probability gn(N an

n (a))

�������������

��������� ������ �

������������ �

�������������� �

� ��� ����

Figure 2: Time slot structure with random backoffmechanism

is user heterogeneous since it depends on who (in-stead of how many users) contend the channel.

– Data Transmission: transmit data packets if theuser successfully grabs the channel.

– Channel Selection: choose a channel to accessduring next time slot according to the distributedlearning algorithm in Section 5.

Under a fixed channel selection profile a, the long-runexpected throughput of a secondary user n choosing channelan can be computed as

Un(a) = θanBnangn(N an

n (a)). (3)

Since our analysis is from the secondary users’ perspective,we will use the terms “secondary user” and “user” inter-changeably. Due to page limit, the detailed proofs are givenin our technical report [5].

3. SPATIAL SPECTRUM ACCESS GAMEWe now consider the problem that each user tries to max-

imize its own throughput by choosing a proper channel dis-tributively. Let a−n = {a1, ..., an−1, an+1, ..., aN} be thechannels chosen by all other users except user n. Givenother users’ channel selections a−n, the problem faced by auser n is

maxan∈M

U(an, a−n),∀n ∈ N . (4)

The distributed nature of the channel selection problem nat-urally leads to a formulation based on the game theory,such that users can self organize into a mutually accept-able channel selection (pure Nash equilibrium) a∗ =(a∗1, a

∗2, ..., a

∗N ) with

a∗n = arg maxan∈M

U(an, a∗−n), ∀n ∈ N . (5)

We thus formulate the distributed channel selection problemon an interference graph G as a spatial spectrum accessgame Γ = (N ,M, G, {Un}n∈N ), where N is the set of play-ers, M is the set of strategies, G describes the interferencerelationship among the players, and Un is the payoff functionof player n.It is known that not every finite strategic game possesses

a pure Nash equilibrium [17]. We then introduce a more gen-

eral concept of mixed Nash equilibrium. Let σn � (σn1 , ..., σnM )

denote the mixed strategy of user n, where 0 ≤ σnm ≤ 1 is theprobability of user n choosing channel m, and

∑Mm=1 σ

nm =

1. For simplicity, we use the same payoff notation Un(σ1, ...,σN )to denote the expected throughput of user n under the mixedstrategy profile (σ1, ...,σN ), and it can be computed as

Un(σ1, ...,σN ) =

M∑a1=1

σ1a1 ...

M∑aN=1

σNaNUn(a1, ..., aN ). (6)

Similarly to the pure Nash equilibrium, the mixed Nashequilibrium is defined as:

207

Page 4: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

Definition 1 (Mixed Nash Equilibrium [17]). Themixed strategy profile σ∗ = (σ∗1, ...,σ

∗N ) is a mixed Nash

equilibrium, if for every user n ∈ N , we have

Un(σ∗n,σ

∗−n) ≥ Un(σn,σ

∗−n), ∀σn �= σ∗n,

where σ∗−n denote the mixed strategy choices of all otherusers except user n.

Note that the pure Nash equilibrium is a special case ofthe mixed Nash equilibrium, wherein every user chooses asingle channel with probability one. One critical issue ingame theory is the existence of both mixed and pure Nashequilibria, which motivates the study in the following Sec-tion 4.

4. EXISTENCE OF NASH EQUILIBRIAIn this part, we study the existence of Nash equilibria

in a spatial spectrum access game. Since a spatial spectrumaccess game is a finite strategic game (i.e., with finite numberof players and finite number of channels), we know that italways admits a mixed Nash equilibrium according to [17].On the other hand, not every finite strategic game pos-

sesses a pure Nash equilibrium [17]. A pure Nash equi-librium is much preferable than a general mixed strategyNash equilibrium, as in a pure strategy equilibrium users canachieve mutually acceptable channel selections without ran-domly picking and switching channels all the time. This mo-tivates us to further investigate the existence of pure NashEquilibria of the spatial spectrum access games.

4.1 Existence of Pure Nash Equilibria on Di-rected Interference Graphs

We first study the existence of pure Nash Equilibria ondirected interference graphs.First of all, we can construct a game which does not have

a pure Nash equilibrium.

Theorem 1. There exists a spatial spectrum access gameon a directed interference graph not admitting any pure Nashequilibrium.

Figure 3 shows such an example. It is easy to verify thatfor all 8 possible channel selection profiles, there always ex-ists one user (out of these three users) having an incentiveto change its channel selection unilaterally to improve itsthroughput.We then focus on identifying the conditions under which

the game admits a pure Nash equilibrium. To proceed, wefirst introduce the following lemma.

Lemma 1. Consider any spatial spectrum access game ona given directed interference graph G that has a pure Nashequilibrium. Then we can construct a new spatial spectrumaccess game by adding a new player, who can not gener-ate interference to any player in the original game and mayreceive interference from one or multiple players in the orig-inal game. The new game also has a pure Nash equilibrium.

We know that any directed acyclic graph (i.e., a directedgraph contains no directed cycles) can be given a topologicalsort (i.e., an ordering of the nodes), such that if node i < jthen there are no edges directed from the node j to node iin the ordering [3]. From Lemma 1, we know that

Figure 3: An example of spatial spectrum accessgame without pure Nash equilibria. There are twochannels available and the throughput of a user nis given as Un(a) = p

∏i∈Nan

n (a)(1 − p). If all three

players (nodes) choose channel 1, then each playerhas the incentive of choosing channel 2 to improveits throughput assuming that the other two playersdo not change their channel choices. We can showthat such derivation will happen for all 8 possiblestrategy profiles a = (a1, a2, a3), where ai ∈ {1, 2} fori ∈ {1, 2, 3}.

Corollary 1. Any spatial spectrum access game on adirected acyclic graph has a pure Nash equilibrium.

To obtain more insightful results, we next impose the fol-lowing property on the spatial spectrum access games:

Definition 2 (Congestion Property). User n’s chan-nel grabbing probability gn (N an

n (a)) satisfies the congestion

property if for any N ann (a) ⊆ N an

n (a), we have

gn(N ann (a)) ≥ gn (N an

n (a)) . (7)

Furthermore, a spatial spectrum access game satisfies thecongestion property if (7) holds for all users n ∈ N .The congestion property means that the more contending

users exist, the less chance a user can grab the channel. Sucha property is natural for practical wireless systems such asthe random backoff and Aloha systems. We can show that

Lemma 2. Consider any spatial spectrum access game sat-isfying the congestion property on a given directed interfer-ence graph G that has a pure Nash equilibrium. Then wecan construct a new spatial spectrum access game by addinga new player, whose channel grabbing probability satisfies thecongestion property and who can have interference relation-ship with at most one player n ∈ N in the original game.The new game also has a pure Nash equilibrium.

Definition 3 (Directed Tree [3]). A directed graphis called a directed tree if the corresponding undirected graphobtained by ignoring the directions on the edges of the orig-inal directed graph is a tree.

Note that a (undirected) tree is a special case of directedtrees. Since any spatial spectrum access game over a sin-gle node always has a pure Nash equilibrium, we can thenconstruct the directed tree recursively by introducing a newnode and adding an (directed or undirected) edge betweenthis node and one existing node. From Lemma 2, we obtainthat

Corollary 2. Any spatial spectrum access game satis-fying the congestion property on a directed tree has a pureNash equilibrium.

Definition 4 (Directed Forest [3]). A directed graphis called a directed forest if it consists of a disjoint union ofdirected trees.

208

Page 5: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

Figure 4: An interference graph that consists of di-rected acyclic graphs and directed trees

Similarly, we can obtain from Lemma 2 that

Corollary 3. Any spatial spectrum access game satis-fying the congestion property on a directed forest has a pureNash equilibrium.

As illustrated in Figure 4, according to Lemmas 1 and2, we can construct more complicated directed interferencegraphs over which a spatial spectrum access game satisfyingthe congestion property has a pure Nash equilibrium.

4.2 Existence of Pure Nash Equilibria on Undi-rected Interference Graphs

We now study the case that the interference graph is undi-rected. This is a good approximation of reality if the trans-mitter of each user is close to its receiver, and all users’transmit powers are roughly the same.When an undirected interference graph is a tree, according

to Corollary 2, any spatial spectrum access game satisfyingthe congestion property has a pure Nash equilibrium. How-ever, for those non-tree undirected graphs without a topo-logical sort, the existence of pure Nash equilibrium can notbe proved following the results in previous Section 4.1. Thismotivates us to further study the existence of pure Nashequilibria on generic undirected interference graphs.First of all, [15] showed that a 3-players and 3-resources

congestion game with user-specific congestion weights maynot have a pure Nash equilibrium. Such a congestion gamecan be considered as a spatial spectrum access game on acomplete undirected interference graph (by regarding theresources as channels). When all users have homogeneouschannel contention capabilities and all channels have thesame mean data rates, [22] showed that the spatial spec-trum access game on any undirected interference graphs hasa pure Nash equilibrium. Clearly, the applicability of sucha channel-homogeneous model is quite limited, since thechannel throughputs in practical wireless networks are oftenheterogeneous. We hence next focus on exploring the ran-dom backoff and Aloha systems with user-specific data rates,which provide useful insights for the user-homogeneous anduser-heterogeneous channel contention mechanisms, respec-tively.Here we resort to a useful tool of potential game2, which

is defined as

Definition 5 (Potential Game [16]). A game is calleda potential game if it admits a potential function Φ(a) suchthat for every n ∈ N and a−n ∈MN−1,

sgn(Φ(a

′n, a−n)− Φ(an, a−n)

)2Note that it is much more difficult to find a proper potentialfunction to take into account users’ asymmetric relationships(i.e., directions of edges on graph) when the interferencegraph is directed. Hence in this study we only apply thetool of potential game in the undirected case.

= sgn(Un(a

′n, a−n)− Un(an, a−n)

),

where sgn(·) is the sign function.

Definition 6 (Better Response Update [16]). The

event where a player n changes to an action a′n from the ac-

tion an is a better response update if and only if Un(a′n, a−n) >

Un(an, a−n).

An appealing property of the potential game is that italways admits a pure Nash equilibrium and the finite im-provement property, which is defined as

Definition 7 (Finite Improvement Property [16]).A game has the finite improvement property if any asyn-chronous better response update process (i.e., no more thanone player updates the strategy at any given time) termi-nates at a pure Nash equilibrium within a finite number ofupdates.

Based on the potential game theory, we first study therandom backoff mechanism. We show in Theorem 2 thatwhen the undirected interference graph is complete, thereexists indeed a pure Nash equilibrium.

Theorem 2. Any spatial spectrum access game on a com-plete undirected interference graph with the random backoffmechanism is a potential game with the potential function

Φ(a) =

N∏n=1

θanBnan

M∏m=1

Km(a)∏c=1

gn(c), (8)

where Km(a) is the number of users choosing channel m un-der the strategy profile a, and hence has a pure Nash equi-librium.

We then consider the random backoff mechanism in theasymptotic case that λmax goes to infinity. This can be agood approximation of reality when the number of backoffmini-slots is much greater than the number of interferingusers, and collision rarely occurs. In this case, we have

gn(N ann (a)) = lim

λmax→∞

λmax∑λ=1

1

λmax

(λmax − λλmax

)Knan

(a)

=

∫ 1

0

xKnan

(a)dx =1

1 +Knan(a)

, (9)

here Knan(a) denotes the number of users that choose chan-

nel an and can interfere with user n. Equation (9) im-plies that the channel opportunity is equally shared among1+Kn

an(a) contending users (including user n). We considerthe user specific throughput as

Un(an, a−n) = hnθanBan1

1 +Knan(a)

, (10)

where hn is regarded as a user-specific transmission gain.For example, a user n can possess a higher transmissiongain if the distance between its transmitter and receiver isshorter than other users given that all the users transmitwith the same power level. We show that

Theorem 3. Any spatial spectrum access game on anyundirected interference graph with user-specific transmission

209

Page 6: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

gains and the random backoff mechanism in the asymptoticcase is a potential game with the potential function

Φ(a) = −N∑n=1

(1 + 1

2Knan(a)

θanBan

), (11)

and hence has a pure Nash equilibrium.

We now consider the Aloha mechanism. According to (2),we have the throughput function as

Un(a) = θanBnanpn

∏i∈Nan

n (a)

(1− pi). (12)

We can show that

Theorem 4. Any spatial spectrum access game on anyundirected interference graph with the Aloha mechanism isa potential game with the potential function

Φ(a) =

N∑i=1

− log(1− pi)

×⎛⎝1

2

∑j∈Nai

i (a)

log(1− pj) + log(θaiB

iaipi

)⎞⎠ , (13)

and hence has a pure Nash equilibrium.

When a spatial spectrum access game is a potential game,we can design a distributed algorithm such that each userasynchronously updates the channel selection myopically toincrease its throughput. According to the finite improve-ment property of the potential game, such an algorithm canachieve a pure Nash equilibrium within finite number of it-erations. However, asynchronous better response updatesrequire each user to have the complete information of otherusers’ channel selections. This can only be achieved with ex-tensive information exchange among the users, which maynot be always feasible. It will be very nice to design a dis-tributed algorithm that achieves the equilibrium without in-formation exchange.

5. DISTRIBUTED LEARNING FOR SPATIALSPECTRUM ACCESS

In this part, we discuss how to achieve an equilibrium forthe spatial spectrum access games. As shown in Section 4,a generic spatial spectrum access game does not necessar-ily have a pure Nash equilibrium, and thus it is impossibleto design a mechanism achieving pure Nash equilibria ingeneral. We hence target on approaching the mixed Nashequilibria. Govindan and Wilson in [10] proposed a globalNewton method to compute the mixed Nash equilibria forany finite strategic games. This method hence can be ap-plied to find the mixed Nash equilibria for the spatial spec-trum access games. However, such an approach is a cen-tralized optimization, which requires that each user has thecomplete information of other users and compute the solu-tion accordingly. This is often infeasible in a cognitive ra-dio network, since acquiring complete information requiresheavy information exchange among the users, and settingup and maintaining a common control channel for messagebroadcasting demands high system overheads [1]. Moreover,this approach is not incentive compatible since some usersmay not be willing to share their local information due to

����� ������ ���

����� ������ ���

��� ����� ������ ���

������ ���

��� ������ ������

������ ���

���

Figure 5: Time structure of a decision period

the energy consumption of information broadcasting. Wethus propose a distributed learning algorithm for any spatialspectrum access games, and the algorithm does not requireany information exchange among users. Each user only esti-mates its expected throughput locally, and learns to adjustits channel selection strategy adaptively. We show that thedistributed learning algorithm can converge to a mixed Nashequilibrium approximately.

5.1 Expected Throughput EstimationWe first introduce the estimation of user’s expected through-

put based on local observations. To achieve an accurateestimation, a user needs to gather a large number of localobservation samples. This motivates us to divide the spec-trum access time into a sequence of decision periods indexedby T (= 1, 2, ...), where each decision period consists of tmax

time slots (see Figure 5 as an illustration). During a singledecision period, a user accesses the same channel in all tmax

time slots. Thus the total number of users accessing eachchannel does not change within a decision period, which al-lows users to better learn the environment.Suppose user n chooses channel m to access at decision

period T . According to (3), a user’s expected through-put during period T depends on the probability of grabbingthe channel gn(N an

n (a(T ))) on that period, the channel idleprobability θm, and the mean data rate Bnm. Similarly toour work in [7] on the expected throughput estimation forimitation-based spectrum sharing mechanism design, we willapply the maximum likelihood estimation (MLE) to get ac-curate estimations of there parameters for the distributedlearning mechanism design, due to MLE’s efficiency and easeof implementation.

5.1.1 Maximum Likelihood EstimationAt the beginning of each time slot t(= 1, ..., tmax) of a

decision period T , a user n will sense the same channelan(T ) = m. If the channel is idle, the user will compete tograb the channel according to a specified channel contentionmechanism. At the end of each time slot t, a user observesSnm(T, t), I

nm(T, t), and b

nm(T, t). Here S

nm(T, t) denotes the

state of the chosen channel m (i.e., whether occupied bythe primary traffic), Inm(T, t) indicates whether the user hassuccessfully grabs the channel, i.e.,

Inm(T, t) =

⎧⎪⎨⎪⎩1, if user n successfully

grabs the channel m,

0, otherwise,

and bnm(T, t) is the received data rate on the chosen channelm by user n at time slot t. At the end of each decision periodT , each user n can collect a set of local observations Ωn(T ) ={Snm(T, t), Inm(T, t), bnm(T, t)}tmax

t=1 . Note that if Snm(T, t) = 0

(i.e., the channel is occupied by the primary traffic), we alsoset Inm(T, t) and b

nm(T, t) to be 0.

When the channel m is idle (i.e., no primary traffic), user

210

Page 7: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

n will grab the channel with probability gn(N ann (a(T ))).

Since there are a total of∑tmaxt=1 Snm(T, t) rounds of channel

contentions in the period T and each round is independentand identically distributed (i.i.d.), the total number of suc-cessful channel captures

∑tmaxt=1 Inm(T, t) follows the Binomial

distribution. A user n can then compute the likelihood ofgn(N an

n (a(T ))), i.e., the probability of the realized observa-tions Ωn(T ) given the parameter gn(N an

n (a(T ))) as

L[Ωn(T )|gn(N ann (a(T )))]

=

( ∑tmaxt=1 Snm(T, t)∑tmaxt=1 Inm(T, t)

)gn(N an

n (a(T )))∑tmax

t=1 Inm(T,t)

× (1− gn(N ann (a(T ))))

∑tmaxt=1 Sn

m(T,t)−∑tmaxt=1 Inm(T,t).

Then MLE of gn(N ann (a(T ))) can be computed by maximiz-

ing the log-likelihood function lnL[Ωn(T )|gn(N ann (a(T )))],

i.e., maxg(k(T )) lnL[Ωn(T )|gn(N ann (a(T )))]. By the first or-

der condition, we obtain the optimal solution as gn(N ann (a(T ))) =

∑tmaxt=1 Inm(T,t)

∑tmaxt=1 Sn

m(T,t), which is the sample averaging estimation.

When length of decision period tmax is large, by the cen-tral limit theorem, we know that

gn(N ann (a(T )))

∼ N(gn(N an

n (a(T ))),gn(N an

n (a(T )))(1− gn(N ann (a(T ))))∑tmax

t=1 Snm(T, t)

),

where N (·) denotes the normal distribution.Similarly, we can apply the MLE to estimate the chan-

nel idle probability θm and the mean channel data rate Bnm.More specifically, when the channel state Sm(t) and the real-ized data rate bnm(t) are i.i.d. random variables, we can easily

obtain the closed-form estimations as θm =∑tmax

t=1 Snm(T,t)

tmax

and Bnm =∑tmax

t=1 bnm(T,t)∑tmax

t=1 Inm(T,t), respectively3. By the MLE, we

can obtain the estimation of gn(N ann (a(T ))), θm and Bnm

as gn(N ann (a(T ))), θm and Bnm, respectively, and then esti-

mate the true expected throughput Un(a(T )) as Un(T ) =

θmBnmgn(N an

n (a(T ))). Since according to the central limit

theorem gn, θm, and Bnm follow independent normal distri-

butions with the mean gn(N ann (a(T ))), θm, and B

nm, respec-

tively, we thus have

E[Un(a(T ))] = E[θmBmgn(N ann (a(T )))] = Un(a(T )),

i.e., the estimation of expected throughput Un(a(T )) is un-biased.

5.2 Distributed Learning AlgorithmBased on the expected throughput estimation, we now

propose the distributed learning algorithm for spatial spec-trum access games. The idea is to extend the principle ofsingle-agent reinforcement learning to a multi-agent setting.Such multi-agent reinforcement learning algorithm has alsobeen applied to the classical congestion games on completegraphs [14,20]. Here we apply the learning algorithm to thegeneralized spatial congestion games on any generic graphsand derive the convergence conditions accordingly. The al-gorithm works as follows.

3When Sm(t) and bnm(t) are non-i.i.d. random variables, the

MLE can also be derived based on the specific probabilitydistribution functions by following the similar procedure asintroduced in Section 5.1.1.

At the beginning of each period T , a user n ∈ N chooses achannel an(T ) ∈M to access according to its mixed strategyσn(T ) = (σnm(T ),∀m ∈M), where σnm(T ) is the probabilityof choosing channel m. The mixed strategy is generated ac-cording to P n(T ) = (Pnm(T ),∀m ∈M), which represents itsperceptions of the payoff performance of choosing differentchannels based on local estimations. Perceptions are basedon local observations in the past and may not accuratelyreflect the expected payoff. For example, if a user n has notaccessed a channel m for many decision intervals, then per-ception Pnm(T ) can be out of date. The key challenge for thelearning algorithm is to update the perceptions with properparameters such that perceptions equal to expected payoffsat the equilibrium.Similarly to the single-agent learning, we choose the Boltz-

mann distribution as the mapping from perceptions to mixedstrategies, i.e.,

σnm(T ) =eγP

nm(T )∑M

i=1 eγPn

i (T ), ∀m ∈M, (14)

where γ is the temperature that controls the randomness ofchannel selections. When γ → 0, each user will choose toaccess channels uniformly at random. When γ →∞, user nalways chooses the channel with the largest perception valuePnm(T ) among all channel m ∈ M. We will show later onthat the choice of γ trades off convergence and performanceof the learning algorithm.At the end of a decision period T , a user n computes its

estimated expected payoff Un(a(T )) as in Section 5.1 (i.e.,by using the MLE method based on the set of local observa-tions Ωn(T ) during the period), and adjusts its perceptionsas

Pnm(T+1) =

{(1− μT )Pnm(T ) + μT Un(a(T )), if an(T ) = m,

Pnm(T ), otherwise,

(15)where (μT ∈ (0, 1), ∀T ) are the smoothing factors. A useronly changes the perception of the channel just accessed inthe current decision period, and keeps the perceptions ofother channels unchanged.Algorithm 1 summarizes the distributed learning algo-

rithm. Next we study the convergence of the learning algo-rithm based on the theory of stochastic approximation [12].

5.3 Convergence of Distributed Learning Al-gorithm

We now study the convergence of the proposed distributedlearning algorithm.First, the perception value update in (15) can be written

in the following equivalent form,

Pnm(T+1)−Pnm(T ) = μT [Znm(T )−Pnm(T )], ∀n ∈ N ,m ∈M,

(16)where Znm(T ) is the update value defined as

Znm(T ) =

{Un(a(T )), if an(T ) = m,

Pnm(T ), otherwise.(17)

For the sake of brevity, we denote the perception val-ues, update values, and mixed strategies of all the users asP (T ) � (Pnm(T ),∀m ∈M, n ∈ N ), Z(T ) � (Znm(T ), ∀m ∈M, n ∈ N ), and σ(T ) � (σnm(T ),∀m ∈ M, n ∈ N ), respec-tively.

211

Page 8: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

Algorithm 1 Distributed Learning Algorithm For SpatialSpectrum Access Game

1: initialization:2: set the temperature γ.3: set the initial perception values Pnm(0) =

1Mfor each

user n ∈ N .4: set the period index T = 0.5: end initialization

6: loop for each decision period T and each user n ∈ N inparallel:

7: select a channel m ∈M according to (14).8: for each time slot t in the period T do9: sense and contend to access the channel m.10: record the observations Snm(T, t), I

nm(T, t) and

bnm(T, t).11: end for12: estimate gn(N an

n (a(T ))), θm, and Bnm by the max-

imum likelihood estimation.13: compute the estimated expected payoff Un(a(T )).14: update the perceptions value P n(T ) according to

(15).15: set the period index T = T + 1.16: end loop

Let Pr{Nmn (a(T ))|P (T ), an(T ) = m} denote the condi-

tional probability that, given that the users’ perceptions areP (T ) and user n chooses channel m, the set of users thatchoose the same channel m in user n’s neighborhood Nn isNmn (a(T )) ⊆ Nn. Since each user independently chooses

a channel according to its mixed strategy σn(T ), then therandom set Nm

n (a(T )) follows the Binomial distribution of|Nn| independent non-homogeneous Bernoulli tries with theprobability mass function as

Pr{Nmn (a(T ))|P (T ), an(T ) = m}

=∏

i∈Nmn (a(T ))

(σim(T ))∏

i∈Nn\Nmn (a(T ))

(1− σim(T ))

=∏i∈Nn

(σim(T ))I{ai(T )=m}(1− σim(T ))1−I{ai(T )=m} , (18)

where I{ai(T )=m} = 1 if user i chooses channel m, andI{ai(T )=m} = 0 otherwise.Since the update value Znm(T ) depends on user n’s esti-

mated payoff Un(a(T )) (which in turn dependents onNmn (a(T ))),

thus Znm(T ) is also a random variable. The equations in (16)are hence stochastic difference equations, which are difficultto analyze directly. We thus focus on the analysis of itsmeandynamics [12]. To proceed, we define the mapping from theperceptions P (T ) to the expected payoff of user n choosing

channel m as Qnm(P (T )) � E[Un(a(T ))|P (T ), an(T ) = m].Here the expectation E[·] is taken with respective to themixed strategies σ(T ) of all users (i.e., the perceptions P (T )of all users due to (14)). We show that

Lemma 3. For the distributed learning algorithm, if thetemperature satisfies

γ <1

2maxm∈M,n∈N {θmBnm}maxn∈N {|Nn|} , (19)

the mapping from the perceptions to the expected payoff Q(P (T )) �

(Qnm(P (T )),m ∈M, n ∈ N ) forms a maximum-norm con-traction.

Note that the condition (19) is a sufficient condition toform a contraction mapping, which is in turn is a sufficientcondition for convergence. Simulation results show that aslightly larger γ may also lead to the convergence of themapping. Based on the property of contraction mapping,there exists a fixed point P ∗ such that Q(P ∗) = P ∗. Bythe theory of stochastic approximations [12], we show thatthe distributed learning algorithm also converges to the samelimit point P ∗.

Theorem 5. For the distributed learning algorithm, if thetemperature γ satisfies (19),

∑T μT =∞ and

∑T μ

2T <∞,

then the sequence {P (T ),∀T ≥ 0} converges to the unique

limit point P ∗ � (Pn∗m ,∀m ∈ M, n ∈ N ) of the differentialequations (∀m ∈M, n ∈ N )

dPnm(T )

dT= σnm(T ) (Q

nm(P (T ))− Pnm(T )) , (20)

with probability one. Further, the limit point P ∗ satisfies

Qnm(P∗) = Pn∗m ,∀m ∈M, n ∈ N . (21)

We next explore the property of the equilibrium P ∗ ofthe distributed learning algorithm. From Theorem 5, we seethat

Qnm(P∗) = E[Un(a(T ))|P ∗, an(T ) = m] = Pn∗m . (22)

It means that the perception value Pn∗m is an accurate esti-mation of the expected payoff in the equilibrium. Moreover,we show that the mixed strategy σ∗ is an approximate Nashequilibrium.

Definition 8 (Approximate Nash Equilibrium [8]).A mixed strategy profile σ = (σ1, ..., σN ) is a ξ- approximateNash equilibrium if

Un(σn, σ−n) ≥ maxσn

Un(σn, σ−n)− ξ,∀n ∈ N ,

where Un(σn, σ−n) denotes the expected payoff of player nunder mixed strategy σ, and σ−n denotes the mixed strategyprofile of other users except player n.

Here ξ ≥ 0 is the gap from a (precise) mixed Nash equi-librium. For the distributed learning algorithm, we showthat

Theorem 6. For the distributed learning algorithm, themixed strategy σ∗ in the equilibrium P ∗ is a ξ-approximateNash equilibrium, with ξ = maxn∈N {− 1

γ

∑Mm=1 σ

n∗m lnσn∗m }.

The gap ξ can be interpreted as the weighted entropy,which describes the randomness of the learning exploration.A larger ξ means worse learning performance. When eachuser adopts the uniformly random access, the gap ξ reachesthe maximum value and results in the worst learning per-formance. Theorems 5 and 6 together illustrate the trade-off between the convergence and performance through thechoice of γ. A small enough γ is required to explore the envi-ronment (so that users are not getting stuck in channels withthe current best payoffs) and guarantee the convergence ofdistributed learning. If γ is too small, however, then theperformance gap ξ is large due to over-exploration.

212

Page 9: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

��� ���

������

!

"

#

$ �

!

"

#

$

� �

� !

" #

$

� �

! "

#

$

� �

Figure 6: Interference Graphs

6. NUMERICAL RESULTSWe now evaluate the proposed distributed learning al-

gorithm by simulations. We consider a Rayleigh fadingchannel environment. The data rate of user n on an idlechannel m is given according to the Shannon capacity, i.e.,

bnm(t) = Vm log2

(1 +

ζngnm(t)

N0

), where Vm is the bandwidth

of channel m, ζn is the power adopted by user n, N0 is thenoise power, and gnm(t) is the channel gain, which is a ran-dom variable that follows the exponential distribution withthe mean gnm. In the following simulations, we set Vm = 10MHz, N0 = −100 dBm, and ζn = 100 mW. By choosing dif-ferent mean channel gains gnm, we have different mean datarates Bnm = E[bnm(t)] for different channels and users. Forsimplicity, we set channel state Sm(t) as an i.i.d. Bernoullirandom variable with the idle probability θm = 0.5.We consider a network of M = 5 channels and N = 9

users with four different interference graphs (see Figure 6).Graphs (a) and (b) are undirected, and Graphs (c) and (d)

are directed. Let �Bn = {Bn1 , ..., BnM} be the mean data ratevector of user n. We set �B1 = �B2 = �B3 = {2, 6, 16, 20, 30}Mbps, �B4 = �B5 = �B6 = {4, 12, 32, 40, 60} Mbps, and �B7 =�B8 = �B9 = {10, 30, 80, 100, 150} Mbps. We implement boththe random backoff and Aloha mechanisms for channel con-tention. For the random backoff mechanism, we set the num-ber of backoff mini-slots in a time slot λmax = 10. For theAloha mechanism, the channel contention probabilities ofthe users are {0.7, 0.7, 0.7, 0.5, 0.5, 0.5, 0.3, 0.3, 0.3}, respec-tively. Notice that in this study we focus on channel choicesinstead of the adjustment of contention probabilities.For the distributed learning algorithm initialization, we

set the length of each decision period tmax = 200, which canachieve a good estimation of the mean data rate. We setthe smooth factor μT =

1200+T

, which satisfies the condition∑T μT =∞ and

∑T μ

2T <∞.

We first evaluate the distributed learning algorithm withdifferent choices of temperature γ on the interference graph(d) in Figure 6. We run the learning algorithm sufficientlylong until the time average system throughput does notchange. The result in Figure 7 verifies the trade-off be-tween the convergence and performance, and demonstratesthat a proper temperature γ can offer the best performance.When is γ small, the gap ξ in Theorem 6 can be large.When γ is very large, the algorithm may get stuck in localoptimum and the performance is again negatively affected.We set γ = 5.0 in the following simulations since it achieves

0.001 0.005 0.01 0.05 0.1 0.5 1.0 2.0 5.0 8.0 10.0 15.0 20.0 30.0 40.0

50

100

150

200

250

300

Temperature γ

Sys

tem

Ave

rage

Thr

ough

put

Random Backoff MechanismAloha Mechanism

Figure 7: The system performance of the distributedlearning algorithm with different temperature γ

0 200 400 600 800 1000 1200 1400 1600 1800 20000

10

20

30

40

50

60

70

80

Iterations

Use

rs’ A

vera

ge T

hrou

ghpu

t

User 9 User 8

User 7

User 5 User 6

User 4User 2User 1 User 3

Figure 8: Users’ averagethroughput on interfer-ence graph (d) with ran-dom backoff mechanism

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5

10

15

20

25

Iterations

Use

rs’ A

vera

ge T

hrou

ghpu

t

User 7User 8

User 6

User 1User 3 User 4 User 5 User 2

User 9

Figure 9: Users’ aver-age throughput on inter-ference graph (d) withAloha mechanism

good system performance in both random backoff and Alohamechanisms as in Figure 7.We then look at the learning dynamics. Figures 8 and 9

show the dynamics of users’ time average throughputs withrandom backoff and Aloha mechanisms, respectively. Theseresults demonstrate the convergence of the distributed learn-ing algorithm in both mechanisms.To benchmark the performance of the distributed learning

algorithm, we compare it with the solutions obtained by thefollowing two algorithms:

• Random Access: each user chooses a channel to accesspurely randomly.

• Centralized Optimization: the solution obtained bysolving the centralized global optimization ofmaxa

∑n∈N Un(a).

We implement these algorithms together with the dis-tributed learning algorithm on the four types of interfer-ence graphs in Figure 6. The results are shown in Figures10 and 11. For the random backoff (Aloha, respectively)mechanism, we see that the distributed learning algorithmachieves up-to 100% (65%, respectively) performance im-provement over the random access algorithm. Comparedwith the centralized optimal solution, the performance lossof the distributed learning in the full-interference graph (b)is 28% (34%, respectively). Such performance loss is not dueto the algorithm design; instead it is due to the selfish na-ture of the users. In the partial-interference graphs (a), (c),and (d), the performance loss can be further reduced to lessthan 10% (17%, respectively). This shows that the negative

213

Page 10: [ACM Press the thirteenth ACM international symposium - Hilton Head, South Carolina, USA (2012.06.11-2012.06.14)] Proceedings of the thirteenth ACM international symposium on Mobile

������

� �

�� �

���!

��!

"

#"

�""

�#"

�""

�#"

�""

�#"

�����$�% �����$&% �����$�% �����$�%

�����������

'�����&!���(������

)�����*�+��,�����+����

Figure 10: Compari-son of distributed learn-ing, random access, andcentralized optimizationwith the random backoffmechanism

������

� �

�� �

���!

��!

"

�"

-"

."

/"

�""

��"

�-"

�����$�% �����$&% �����$�% �����$�%

�����������

'�����&!���(������

)�����*�+��,�����+����

Figure 11: Compari-son of distributed learn-ing, random access, andcentralized optimizationwith the Aloha mecha-nism

impact of users’ selfish behavior is smaller when users canshare the spectrum more efficiently through spatial reuse.

7. CONCLUSIONIn this paper, we explore the spatial aspect of distributed

spectrum sharing, and propose a framework of spatial spec-trum access game on directed interference graphs. We inves-tigate the critical issue of the existence of pure Nash equilib-ria, and develop a distributed learning algorithm convergingto an approximate mixed Nash equilibrium for any spatialspectrum access games. Numerical results show that thealgorithm is efficient and has significant performance gainover a random access algorithm that does not take the spa-tial effect into consideration.For the future work, we are going to design distributed

algorithms that maximize the system-wide throughput.

ACKNOWLEDGMENTSThis work is supported by the General Research Funds (ProjectNumber 412509, 412710, and 412511) established under theUniversity Grant Committee of the Hong Kong Special Ad-ministrative Region, China.

8. REFERENCES[1] I. Akyildiz, W. Lee, M. Vuran, and S. Mohanty. Next

generation/dynamic spectrum access/cognitive radiowireless networks: a survey. Computer Networks,50(13):2127–2159, 2006.

[2] A. Anandkumar, N. Michael, and A. Tang.Opportunistic spectrum access with multiple users:learning under competition. In Proc. of IEEEINFOCOM, San Diego, USA, March 2010.

[3] N. Biggs, E. Lloyd, and R. Wilson. Graph Theory.Oxford University Press, 1986.

[4] V. Bilo, A. Fanelli, M. Flammini, and L. Moscardelli.Graphical congestion games with linear latencies. InProc. of ACM SPAA, Munich, Germany, June 2008.

[5] X. Chen and J. Huang. Spatial spectrum access game:Nash equilibria and distributed learning. Technicalreport, The Chinese University of Hong Kong, 2012.Available at http://jianwei.ie.cuhk.edu.hk/publication/MobihocTech.pdf.

[6] X. Chen and J. Huang. Evolutionarily stable openspectrum access in a many-users regime. In Proc. ofIEEE GLOBECOM, Houston, USA, Dec. 2011.

[7] X. Chen and J. Huang. Imitative spectrum access. InProc. of IEEE WiOpt, Paderborn, Germany, May2012.

[8] C. Daskalakisa, A. Mehtab, and C. Papadimitriou. Anote on approximate Nash equilibria. TheoreticalComputer Science, 410(17):1581–1588, 2009.

[9] M. Fl ↪elegyhlczi, M. Cagalj, and J.-P. Hubaux.Efficient mac in cognitive radio systems: Agame-theoretic approach. IEEE Transactions onwireless Communications, 8:1984–1995, 2009.

[10] S. Govindan and R. Wilson. A global newton methodto compute nash equilibria. Journal of EconomicTheory, 110:65–86, 2003.

[11] Z. Han, C. Pandana, and K. J. R. Liu. Distributiveopportunistic spectrum access for cognitive radiousing correlated equilibrium and no-regret learning. InProc. of IEEE WCNC, Hong Kong, March 2007.

[12] H. Kushner and G. Yin. Stochastic Approximation andRecursive: Algorithms and Applications.Springer-Verlag, New York, 2003.

[13] K. Liu and Q. Zhao. Decentralized multi-armed banditwith multiple distributed players. In Proc. of IEEEIAT, San Diego, USA, Jan. 2010.

[14] E. Melo. Congestion pricing and learning in trafficnetwork games. Journal of Public Economic Theory,13(3):351–367, 2011.

[15] I. Milchtaich. Congestion games with player-specificpayoff functions. Games and Economic Behavior,13:111–124, 1996.

[16] D. Monderer and L. S. Shapley. Potential games.Games and Economic Behavior, 14:124–143, 1996.

[17] J. Nash. Equilibrium points in n-person games.Proceedings of the National Academy of Sciences,36:48–49, 1950.

[18] N. Nie and C. Comniciu. Adaptive channel allocationspectrum etiquette for cognitive radio networks. InProc. of IEEE DySPAN, Baltimore, USA, Nov. 2005.

[19] D. Niyato and E. Hossain. Competitive spectrumsharing in cognitive radio networks: a dynamic gameapproach. IEEE Transactions on WirelessCommunications, 7:2651–2660, 2008.

[20] D. Shah and J. Shin. Dynamics in congestion games.ACM SIGMETRICS Performance Evaluation Review,38(1):107–118, 2010.

[21] R. Southwell and J. Huang. Convergence dynamics ofresource-homogeneous congestion game. In Proc. ofICST GameNets, Shanghai, China, April 2011.

[22] C. Tekin, M. Liu, R. Southwell, J. Huang, andS. Ahmad. Atomic congestion games on graphs andtheir applications in networking. to appear inIEEE/ACM Transactions on Networking, 2012.

[23] B. Wang, Y. Wua, and K. R. Liu. Game theory forcognitive radio networks: An overview. ComputerNetworks, 54:2537–2561, 2010.

[24] M. Weiss, M. Al-Tamaimi, and L. Cui. Dynamicgeospatial spectrum modelling: taxonomy, options andconsequences. In Proc. of TPRC Conference, Fairfax,USA, Sep. 2010.

214