how does game theory apply to radio resource management?read.pudn.com/downloads137/doc/583959/how...

How Does Game Theory Apply to Radio Resource Management?

James Neel

In partial fulfillment of the requirements for qualification for acceptance to the Virginia Tech Doctoral Program

Qualifier Committee: Jeffrey H. Reed (advisor) Annamalai Annamalai

R. Michael Buehrer

1 Introduction Radio resource management (RRM) is one of the most challenging and one of the most important aspects of wireless communications. An intelligent radio resource management scheme can significantly improve system performance. For instance, a CDMA (Code Division Multiple Access) system can achieve significant capacity gains relative to a TDMA (Time Division Multiple Access) system. This is not due to any inherent processing advantages provided the Direct Sequence Spread Spectrum (DS-SS) of Frequency Hopped Spread Spectrum (FH-SS) signal. In fact, from an information theory perspective, a CDMA signal has the same capacity as a TDMA signal [1]. Rather, CDMA provides a number of radio resource management advantages not available in a TDMA system. The most notable of these RRM advantages are CDMA’s theoretical frequency reuse factor of 1, and an ability to dynamically reallocate bandwidth during voice inactivity. However, a proper understanding of a RRM algorithm often requires an understanding of a number of complex interrelated processes. Thus the number of considerations when analyzing a RRM algorithm can be quite large. This problem is complicated in distributed dynamic RRM algorithms where interactive decision making processes occur. An understanding of these processes is critical as they are a virtual necessity in the increasingly popular ad-hoc networks and are also encountered to a lesser extent in cellular networks. This report posits that game theory can be applied to the analysis of these interactive decision processes. Indeed it is anticipated that with a formalized approach to applying game theory to RRM issues and an identification of appropriate game theory models, many of the more difficult RRM problems currently addressed primarily through simulation will be translated into “low-hanging fruit” problems - readily understood and analyzed within the game theory framework. The remainder of this document is organized as follows. First an overview of radio resource management is given with specific attention given to power control algorithms. This is followed by a discussion of key concepts from game theory. Then a discussion of how game theory can be applied to radio resource management is given. This is followed by a summary of the key points of this report and a description of future research directions.

2 Radio Resource Management Radio resource management (RRM) can be best understood as a constrained probabilistic optimization problem that can be formulated as follows:

Given a particular infrastructure deployment (constraints), allocate resources (variables) in a manner that (ideally) max(min)imize some operational parameter(s) (objective functions) .

It is important to note that the probabilistic aspect of RRM causes it to differ from most common mathematical optimization problems (linear and nonlinear programming problems). Thus, when evaluating RRM objective functions various statistical measures are frequently used. For instance expected number of dropped calls and the variance in the number of dropped calls are evaluated as opposed to a specific number of outages. RRM is further complicated by the sheer complexity of the interactions of the algorithms under consideration. However like many other optimization problems, RRM also has the complication of having to consider inversely related objectives such as the following:

• Maximize user resources ⇔ Maximize coverage/capacity • Maximize mobility support ⇔ Maximize capacity • Maximize coverage ⇔ Minimize cost

However, efficient spectrum use and optimal resource allocation are critical to the network performance. Coverage holes may be left, quality of service guarantees may be left or an excessive amount of spectrum may be lost to overhead.

2.1 Categorizing RRM Schemes There are several different aspects to radio resource management. In this report, we’ll divide these schemes into fixed RRM design and dynamic RRM algorithms.

2.1.1 Fixed Radio Resource Design and Allocation In a fixed RRM scheme, resource management decisions are made just once, typically before system deployment. Once this decision is made, to varying extents, these resources cannot be reallocated. In fixed resource design, system planners must determine system-wide network parameters based on anticipated operational parameters and performance and economic criteria. Some of the parameters that are determined during this stage include the following:

• Total system bandwidth • Number of access points (base stations) • Operational waveform • Frequency reuse factor • Antenna heights • Acceptable power levels • Specification of performance criteria

These parameters have a significant impact on ne twork performance and also determine the constraints for the fixed resource allocation process. Based on these constraints, fixed resource allocation must address how the system wide resources are initially divided up among the various system components. Some of the issues addressed in fixed resource allocation include the following:

• How many access points are needed (coverage vs capacity vs cost)? • How are resources allocated among the access points (channel assignments)? • How is user mobility handled (how many channels are reserved for handoffs)? • How much resources does each user receive (bandwidth, target SINRs)?

2.1.2 Dynamic Radio Resource Management If wireless networks were static and deterministic, fixed resource design and allocation would be sufficient. However, mobility is central to wireless networks and expected load distributions, mobile locations, fading profiles, and virtually every other assumption considered during fixed resource design and allocation change during operation. Thus nearly every allocation decision is subject to change in practical wireless networks.1 The reasons for this adaptability is fairly clear: it can significantly improve performance. For instance Figure 1 illustrates how dynamic scheduling in a packet based wireless network can outperform a fixed scheduler. As message intensity increases, the dynamic scheduler nearly matches ideal delay performance up until a point is reached (which is significantly less than maximum system message intensity). This limit point occurs because dynamic resource allocation algorithms incur some overhead penalty to message the changes in operational parameters.2 In the example of Figure 1, 40% of the resources are absorbed in the overhead.

1 Many of the resource design choices do remain fixed in wireless networks. However, software radio and cognitive radio promise to allow decisions as fundamental as modulation, coding, and bandwidth to be adjusted after initial deployment. 2 Open loop power control is an exception to the overhead penalty.

λD

Delay

λD

Delay

Figure 1 Expected Delay as a Function of Message Arrival Rate from [11] Figure 10.2

Figure 2 shows another example of the potential advantage of dynamic resource allocation. In this case dynamic channel assignment (DCA) is compared against a fixed channel assignment (FCA) in terms of blocking probability. Note that at low traffic intensity, DCA significantly outperforms FCA. However, as DCA must incur additional overhead, there is a crossover point where DCA begins to underperform FCA. Clearly, if overhead can be reduced without damaging the basic operation of a dynamic algorithm, performance can be expected to significantly improve.

Figure 2 Relative Performance of Fixed and Dynamic Channel Assignment from [11]

Figure 7.2

There are two fundamental approaches to dynamic RRM: centralized dynamic RRM and distributed dynamic RRM. In centralized RRM, a single authority, such as a base station, collects information from various nodes in the network, computes a change in resource allocation, and signals this change to the other nodes in the network. In distributed RRM, each of a number of authorities in the network collect information and adjust the resource allocations within its control. Note that a distributed dynamic RRM algorithm generally incurs less overhead than a centralized dynamic RRM algorithm. However, the operation of distributed algorithms can be difficult to predict as the dynamic actions of one authority can influence the actions of the other authorities in the network. Thus simulations must frequently be used in place of analysis to perform network planning. Additionally, without a convergent state, even more bandwidth might be lost to signaling overhead to accommodate the resource allocation adjustments. The following section gives an in-depth review of perhaps the best known dynamic RRM algorithm: power control. Often as a function of network topology, power control algorithms are implemented as centralized, distributed, and hybrids centralized- distributed algorithms.

2.2 Power Control The performance of all wireless communications systems is a function of the signal-to-interference-plus-noise-ratio (SINR). While readily apparent at the physical layer, it is also generally true at the higher layers. Optimal network performance is typically achieved only at a unique power vector. In a static network, it would be trivial to assign transmit powers to each node in the network to achieve this power vector. However, wireless systems are generally mobile, or at least operate in a dynamic environment, so that any initial power vector assignment will not maintain its optimality. For instance, consider a pedestrian in an urban cellular environment who rounds a corner and creates a line-of-sight (LOS) path to his base station. This results in a significant increase in the power received at his base station, significantly improving his performance, but potentially jamming the other users in the network. Clearly this new environment has a different ideal power vector than the original. In an attempt to maintain the optimum power vector, most modern communications schemes include some form of power control. Power control is a set of real-time algorithms implemented on a network in order to maximize a performance metric. Some common applications of power control include:

• Ensuring proper operation in multi-user direct-sequence spread spectrum (DS-SS) systems [2]

• Trading off system capacity for quality of service [3] • Trading off battery life versus quality of service [4]

Every power control scheme is designed for a particular target application and anticipated devices. These assumptions permit the network planner to maximize QoS while minimizing the use of system resources.

For example consider the following generalization of the reverse- link power control scheme used in IS-95. This scheme is primarily interested in maximizing system capacity while maintaining a minimum QoS, typically measured as a bit-error-rate (BER). For a system operating in the ideal steady-state where all received powers are equal and ignoring out-of-cell interference, [3] gives the relation between system capacity, K desired SINR, Eb/N0, spreading gain W/R, signal power S, and noise power η.

0

/1

/b

W RK

E N Sη

= + − (1)

However, [2] states that if the received powers are instead received with a log normal distribution with a standard deviation of just 2dB, then 60% of the system capacity can be lost. Clearly, power control plays a vital role in the success of a system. As an interesting aside, from (1), it can been that it is possible to trade off capacity for bit error rate. Thus as the number of users in this system increases, performance can “elegantly” degrade.

2.2.1 Fundamental Power Control Concepts This section describes some of the fundamental concepts from power control as well as some of the basic approaches to implementing power control. Specifically, this section addresses the following: open loop power control, closed loop power control, and outer loop power control.

2.2.1.1 Open Loop Power Control In open loop power control, one node, A, of a communications link estimates its signal strength at the other node, B, by monitoring the relative change in the strength of B’s signal. When A detects that B’s signal is strong, A decreases its transmission power level; backs off; when A detects that B’s signal is weak, A increases its transmission power level. According to [11], IS-95 implements an open loop power control scheme by tying the mobile’s automatic gain control circuit (AGC) to the transmitter gain. Thus when the variable gain amplifier increases its gain at the receiver, it also increases its gain applied at the transmitter. A diagram of an AGC circuit is shown in Figure 3.

Figure 3 Generic AGC Circuit [10] Figure 5.46

In general the uplink and downlink channels are spaced closely enough that the large scale fading effects can be well compensated for by using open loop power control. However, small scale fading effects can differ greatly between the uplink and downlink channels as they generally experience different multipaths (line-of-sight channels represent the most notable exception to this assertion). Thus the uplink and downlink channels are generally non-reciprocal and information about one channel provides little information about the other channel.

An example of a non-reciprocal link is shown in Figure 4. Due to the independent fades of the uplink and the downlink, the received signal strength (RSS) of the base station’s signal provides little information about the RSS of the mobile’s signal. Thus if the mobile adjusted its power level solely according to the base station’s RSS, the mobile’s RSS at the base station would vary wildly. In fact, greater signal strength variation may occur than if the open loop power control scheme was never implemented.

RSSBS

RSSMS

t

RS

S

RSSBS

RSSMS

RSSBS

RSSMS

t

RS

S

Figure 4 Fading Independence

2.2.1.2 Closed Loop Power Control To rectify the channel reciprocity problem, the base station needs to either feed information back to the mobile about the mobile’s received signal strength (or more commonly signal- to-interference-plus-noise ratio), or direct the mobile to adjust its power level in accordance to the received signal strength. These approaches are considered in closed loop power control algorithms (also known as inner loop power control algorithms for reasons that will be clear in the next section). For instance in the IS-95 scheme, the uplink closed loop power control scheme attempts to maintain a target SINR at the base station. SINR measurements are made every 1.25 ms at the base station and directions given to mobile to adjust power levels up or down by 1 dB. The closed loop power control value is then added to the open loop power control voltage to determine the mobile’s actual transmitted power level. Thus both large scale and small scale fading effects are accounted for in IS-95 through the combined use of open loop and closed loop power control. Although introducing this feedback necessarily introduces some overhead, this is normally a relatively small penalty (800 Hz in IS-95) and as long as the channel changes slowly enough, performance can be quite good. However, in a highly mobile environment, this is rarely the case. In fact as mobility increases, performance may drop precipitously. For example, consider the simulated power control system shown in Figure 5. Notice that the received signal to interference ratio (SIR) varies greatly over the duration of the simulation.

Figure 5 Closed Loop Power Control Variations [11] Figure 9.4

A more precise formulation of the variation in SIR experienced by closed loop power control schemes in mobile channels is given in [29]. In [29] it is stated that with a IS-95 closed loop power control system, received signal power is log-normally distributed with a variance of 1-2 dB. Thus, while it still represents a significant improvement over open loop power control, the IS-95 closed loop power control scheme can still experience significant reductions in capacity as predicted in [2]. Note that the received power variance is actually an related to the bandwidth of the power control signal and the channel Doppler bandwidth. In general as the ratio of the bandwidth of the power control signal to the Doppler bandwidth increases, the received signal variance decreases. To combat this problem, WCDMA performs power control updates at 1.5 kHz, nearly twice the IS-95 rate of 800 Hz [12].

2.2.1.3 Outer Power Control Loop However, virtually no matter how much bandwidth is allocated for the power control signals, fading can still present a problem. High mobility users can have the effects of their numerous short fades ameliorated through the use of interleaving and forward error correction. Low mobility users may remain in a deep fade for a significant duration. Because of this, low mobility users typically require a higher target SINR than high mobility users. However, it is desirable to operate at the lowest possible SINR to maximize system capacity. One approach to addressing these competing goals is to introduce a third loop to the power control algorithm – the outer power control loop – that

dynamically assigns low mobility users a higher target SINR and higher mobility users a lower target SINR. Note that without installing gyroscopes in the mobiles, directly measuring mobility is not possible. Thus mobility measurements are performed indirectly, primarily through Frame Error Rate (FER) measurements.3 To keep from unnecessarily jerking around the signal levels (which could be disastrous in fades), outer loop power control algorithms measurements are taken at a much slower rate than the inner closed power control loop. For instance in WCDMA, the update frequency of outerloop power control instructions is in the range of 10-100 Hz – one to two orders of magnitude less than the inner loop update rate [12]. Note that the outer loop power control scheme necessarily causes different mobiles to operate with different SINR targets. This can significantly complicate the capacity calculations of a network. As a slight simplification, [11] gives (2) as a necessary condition for a SINR balancing power control scheme to have an achievable solution where γj is the SINR target of mobile j and N is the set of mobiles in the cell.

11

j

j N j

γγ∈

<+∑ (2)

2.2.2 Cellular Power Control This section details some cellular power control schemes with particular interest paid to uplink closed loop power control. In particular, this section covers Yates’s standard interference function model and its variants, and Goodman’s objective oriented power control scheme.

2.2.2.1 Yates’s Standard Interference Function Framework [5] presents perhaps the classic paper on uplink cellular power control. 4 In this framework, each node, j, attempts to achieve a required SINR, jγ , with a minimum power

consumption, pj at its node(s) of interest, jν (one or more base stations). This model

assumes that each node is capable of observing the SINR at jν (or alternately, observes

the total received power at jν and knows its own gains, , jjh

ν). Based on these

observations, the nodes compute a scenario dependent standard interference function ( )I p formed by the ratio of the target SINR and the effective SINR where p is the vector

of transmit powers employed in the cell. The properties of I(p) are key to the results of the model. I(p) has the following properties:

• Positivity I(p) > 0

3 Note that bit error rate (BER) measurements, particularly instantaneous BER measurements, may be over too short of a period to be a good indicator of mobility. 4 Though originally considered as a fixed point problem, [39] shows how to model the scenarios in [5] as S-modular games.

• Monotonicity If p ≥ p*, then I(p) ≥ I(p*) • Scalability For all α > 1, αI(p) > I(αp)

where the convention that p > p* means that *j j j N> ∀ ∈p p .5

In general Ij(p) takes the form shown in (3) as the ratio of target SINR and actual SINR. I(p) is then given by ( ) ( )j N jI I∈= ×p p .

( ) ( )j

jj

Iγ

µ=p

p (3)

Power levels for each mobile are updated at stage k+1 by evaluating (4).

( ) ( ) ( )1 ( )j jp k p k I k+ = p (4)

Assuming capacity constraints are satisfied, this model is shown to converge to a steady-state under the following scenarios:

• Fixed assignment where each mobile is assigned to a particular base station

( 1jν = )

• Minimum power assignment where each mobile is assigned to the base station where its SINR is maximized ( 1jν = , but jν changes)

• Macro diversity where all base stations combine the signals of the mobiles

( 1jν > )

• Limited diversity where a subset of the base stations combine the signals of the

mobiles ( 1jν > )

• Multiple connection reception where the target SINR must be maintained at a number of base stations. ( 1jν > ).

Properties of the Standard Interference Function In [5], Yates shows that the standard interference function has the following properties:

• If the algorithm has a fixed point, then that fixed point is unique. • When I(p) is feasible, a fixed point exists. I(p) is said to be feasible if there exists

some ∈p P such that ( )I ≥p 1 • If I(p) is feasible, then starting at any p(0) other than p(0)=0, then the algorithm

converges to the fixed point when decisions are updated synchronously. Standard Interference Function Scenarios In [5], Yates examines five different cellular applications (scenarios) of this algorithm, each with a different interference function which satisfies the properties of the standard effective interference function.:

• Fixed Assignment (FA) 5 For sake of consistency, the notation used here is the same as the notation introduced in Section 4.2.

• Minimum Power Assignment (MPA) • Macro Diversity (MD) • Limited Diversity (LD) • Multiple Connection Reception (MCR).

Fixed Assignment In the fixed assignment (FA) scenario, each mobile is assigned to a particular base station, b. The standard interference function is given by (5).

( ) ( ),

jFA

j b

Iγ

µ=p

p (5)

[5] also notes that convergence occurs if updating in the FA scenario occurs according to the algorithm created by Foschini [6] given in (6) where β +∈ ¡ .

( ) ( ) ( )( )

,

1 1 11

j jj

j b

p k p kβ

βγ

βµ

+ = − +

−

(6)

Bit Error Rate Variant In [8] Yates and Kumar modify (5) so that the mobiles attempt to achieve a target BER rather than a target BER. This allows the computationally simpler measurement of BER to be used in the place of SINR. The updating algorithm of (6) is replaced with (7)

( ) ( )

( )( ),min1 max ,

1 / 2ln

BER k

jj j jp k p k p

η

+ = p

(7)

where ( )ln 1 / 2j jBη = with Bj being the required BER of j. [8] proceeds to show that this

new formulation satisfies the positivity, monotonicity, and scalability constraints of the framework. Thus by the application of Yates’s framework, this algorithm also converges. It should be noted that under the assumptions of high SINR and a LOS channel, BER can be expressed as ( ) ( )0.5 jBER e µ−= pp . Substituting this expression into (7) yields an expression which is identical to (5) constrained by a lower bound. Minimum Power Assignment (MPA) In minimum power assignment (MPA), each mobile is assigned to the base station that maximizes the mobile’s SINR. The standard interference function for this scenario is given by (8). This is equivalent to a soft handoff process. Again this is shown to satisfy the properties of the standard interference function and thus converges.

( ) ( ),

min jMPAj b B

j b

Iγ

µ∈=p

p (8)

Macro Diversity (MD) In this scenario, all base stations in the network perform maximum ratio combining of user j’s signal at all base stations in the network. The interference function can be expressed as in (9). This relaxes the SINR requirements so that only the sum of SINRs must exceed the target SINR. Again this is shown to satisfy the properties of the standard interference function and thus converges.

( )( ),

jMDj

j bb B

Iγ

µ∈

=∑

pp

(9)

Limited Diversity (LD) In this scenario, all base stations in the network perform maximum ratio combining of user j’s signal at a subset of the base stations in the network. This relaxes the SINR requirements so that only the sum of SINRs must exceed the target SINR. Again this is shown to satisfy the properties of the standard interference function and thus converges.

( ) ( )( )

j

jLD

kjk K

I Iγ

µ∈

= =∑

p pp

(10)

2.2.2.2 Goodman’s Power Control Games Whereas Yates treated distributed power control as a general fixed point problem, Goodman considers distributed power control as a distributed interactive objective maximization problem.6 In this formulation, the objective function is a variant of the expression given in (11)

( ) ( )',i i b

i

Ru f

pµ=p (11)

where R is the data rate, f is the probability of successful bit transmission as a function of a modified SINR, '

,i bµ . ',i bµ is calculated as shown in (12)

,', 2

,\

j b jj b

k b k bk N j

h pWR h p

µσ

∈

=+∑

(12)

where W is the transmission bandwidth.

Single Cell with Linear Pricing In [13], (11) is instantiated with a throughput function for packets of length L where all bits in the packet must be received correctly for successful transmission. This is expressed as (13) where E is the energy content of the battery.

( ) ( )'0.51 jL

jj

ERu e

pµ−−p (13)

6 More accurately, Goodman treats the problem as a game. But calling the algorithm a distributed interactive objective ma ximization problem is equivalent.

[13] shows that a game defined by (13) has at least one fixed point, asserts that it is unique, and states the received powers are all equal at the fixed point. However, this fixed point is not optimal, so [13] introduces a linear pricing function cj given by (14) where t is some positive constant.

( ) , 0j jc tRp t= >p (14)

The subtraction of cj results in the modified objective function is given by (15).

( ) ( )'0.5' 1 jL

j jj

ERu e tRp

pµ−− −p (15)

[13] then proves that the fixed point of the modified algorithm exists through an example simulation and asserts that it yields better performance than the NE from the unmodified algorithm.

Network Assisted Distributed Power Control In [18], (13) is modified so that each mobile has a target SINR.

( ) ( ), '2 ' 1j b

j jj

WhLR Ru f N

Mµ

σ µ

= − −

(16)

Note that ( ) ,' 1 j b jj

WR N h pµ

− − =

and is just another expression for received power

2.2.3 Ad-hoc Power Control Though most closely associated with cellular networks, ad-hoc networks also frequently implement power control schemes. Fundamentally, there are two scenarios under which ad-hoc networks implement power control algorithms: networks with channel partitioning and networks with random-access channels.

2.2.3.1 Channel partitioned networks In channel partitioned ad-hoc networks, some mechanism exists for allocating channel resources to the nodes in the networks. Frequently this occurs in cluster-based ad-hoc networks, like Bluetooth, where each cluster head dictates key operational parameters, such as transmit power level, to its cluster members. Clusters are then connected through gateway nodes that reside in multiple clusters. In general, these networks can be visualized as shown in Figure 6.

Gateway

ClusterHead

ClusterHead

Gateway

ClusterHead

ClusterHead

Figure 6 Clustered Network

After noting the similarity in organization of clustered networks and cellular networks, [19] proposes an algorithm that uses IS-95 –like SINR metrics for forming clusters and for maintaining acceptable performance at the cluster head. Conceptually, the inclusion of new nodes is identical to call admission, and the growth and decrease of cluster sizes in response to loading is similar to cell-breathing. Despite these similarities, it should be noted that power control and routing frequently may have to be considered jointly – indicating a need for a cross- layer approach to algorithm design. With these analogies in mind, cluster-based power control algorithms can be cast in terms of traditional cellular power control schemes. For instance, the classic SINR-based uplink cellular power control scheme for is presented in [5] could be applied or the algorithms in [15] as well.

2.2.3.2 Random Access Networks Pure random access networks typically have less structure than the channel partitioned networks. These networks are organized as meshes of peer-to-peer connections as shown in Figure 7. Due to the propagation of signals and the lack of rigid organization, all of the nodes in this network compete for access to the channel. However, as discussed in [21], when the nodes limit their transmit power level, significant gains in network performance are achievable. Thus power control also has a role to play in these networks.

Figure 7 Random Access Network

Recognizing these advantages, several authors have proposed different distributed power control schemes. In [22], an algorithm for performing distributed power control in 802.11 networks is described. The authors permit the use of ten different power levels and incorporate the necessary signaling into the RTS-CTS-DATA-ACK schemes. Herein each node communicates with its neighbor nodes and chooses a transmit level for each neighbor in such a way that the minimum signal power required for acceptable performance is achieved. In this scenario, then, each node can be modeled as attempting to achieve a target SINR. Although not considered in [22], this could then be modeled using the multiple connection reception scenario of [5] or each connection to be treated as a unique node in the fixed assignment scenario. A similar algorithm has been proposed in [23] wherein a second channel includes an additional channel for power control. Likewise, [24] introduces Noise Tolerance Channels, that function like power control channel, but instead permit each node to announce its amount of “noise” tolerance – the additional interference that can be afforded without losing a currently received signal. Other authors, such as [25] and [26] have further refined the ad-hoc power control problem by introducing beam forming. Most power control games studied to date consider infrastructure-based wireless networks, though generalizations to ad-hoc networks can be made. The choice of a distributed algorithm for a network will be influenced by many factors including steady-state performance, convergence, complexity, stability, and interaction with other layers’ behavior. These form some of the active areas of research within the field of distributed power control and game theory.

3 Game Theory Game theory is a set of mathematical tools used to analyze interactive decision makers. The fundamental component of game theory is the notion of a game, expressed in normal form: , , iG N A u= where G is a particular game, N = 1,2,…,n is a finite set of

players (decision makers), Ai is the set of actions available to player i,

1 2 nA A A A= × × ×L is the action space, and 1 2, , ,i nu u u u= … is the set of utility (objective) functions that the players wish to maximize. Each player’s objective function, ui, is a function of the particular action chosen by player i, ai, and the particular actions chosen by all of the other players in the game, a-i and yields a real number. Other games may include additional components, such as the information available to each player and communication mechanisms. In a repeated game, players are allowed to observe the actions of the other players, remember past actions, and attempt to predict future actions of players.

3.1 Fundamental Game Theory Concepts The following are some fundamental concepts from game theory that must be understood before proceeding with this report. Players Players are the decision making entities in the modeled system. Actions An action set represents the choices available to a player. Note that these choices may be quite complex and, for instance, may represent a sequence of real world actions. Each player in the game has its own action set and makes its decision by choosing an action from its action set. A choice of actions by all players in the game produces an action vector or action tuple. All possible action vectors in the game are contained within the game’s action space. The action space is formed the Cartesian product of every player’s action set. Outcomes Each action vector produces a well defined and expected outcome. Note as an outcome is jointly defined by every player’s action choice, there is an interactive relationship. Thus in every game there exists a mapping from the action space to some outcome space. As this mapping is presumed surjective, most game analyses ignore outcomes and focus solely on the actions that produce the outcomes. Preference Relations Fundamental to game theory is the concept of preference relations. Preference relations describe a comparative preference between two outcomes or action tup les (and thus is a binary operator). The preference operator is normally represented by the symbol f% . Here, a bf% indicates that outcome a is at least as preferable to outcome b. In game

theory, the preference relation is assumed to be reflexive, transitive and complete over the action space. In a game, each player is expected to have preference relations defined over all possible outcomes. Utility Functions While games can be analyzed based on the ordinal relations implied by preference relations, cardinal relations have a richer tool set and are generally preferred for analysis. Utility functions (objective functions) transform the ordinal relationships of players’ preference relationships to cardinal relationships. Generally a utility function is constructed over the action (outcome) space so that if one a is preferable to b, then the cardinal value assigned to a will be greater than the cardinal value assigned to b. Thus in light of utility functions, it may be fair to treat the preference operator, f% , as the greater than or equal to operator, ≥ . Normal Form Game The simplest game form is the normal form game (also known as the strategic form game). The normal form game includes the following model components: a player set, an action space, and a set of preference relations (or utility functions). “Play” proceeds by each player in the game simultaneously selecting an action. This structure can then be analyzed to yield valuable information about the modeled system such as expected steady states and convergence criteria. Nash Equilibrium An action vector a is said to be a Nash equilibrium (NE) iff

( ) ( ), ,i i i i i iu a u b a i N b A−≥ ∀ ∈ ∈ (17) where a is an action tuple, ( ),i ib a− is another action tuple that differs from a only in the component determined by i and ui is player i's utility function. Restated, a NE is an action vector from which no player can profitably unilaterally deviate. NE correspond to the steady-states of the game and are then predicted as the most probable outcomes of the game. Note: demonstrating that an action vector is a NE says nothing about the desirability or optimality of the action vector. Best Response Function A best response function (correspondence) specifies the action (set of actions) for a particular player, say player i, that produces the largest utility given the action tuple chosen by all remaining players, a-i. Nash equilibriums for a game are equivalent to the fixed points of the multi-player best response correspondence formed from the Cartesian product of all players’ best response functions. Example The following is a commonly used example known as the Prisoners’ Dilemma which we presented in [37] to illustrate these concepts. In the Prisoners’ Dilemma, a serious crime and a minor crime have been committed and the police have two suspects that can be placed at the scene of the major crime and are known to have committed the minor crime. The police separate the two suspects and independently offer each the following deal: If

both suspects deny involvement, then the police will charge them with the minor crime and each will get one year in prison. If however, one chooses to confess to both of the suspects’ involvement in the major crime while the other continues to deny involvement, the one that confesses will be let free and the other will receive 15 years. Should they both choose to confess to the major crime, each will receive 10 years. This game can be visualized as shown below in Figure 8 where the Nash Equilibrium (both confess) for the game is highlighted in red. Notice that neither suspect would choose to deviate from confessing as to do so would increase their individual prison sentence from 10 to 15 years. Also note that (C,C) also represents a fixed point for the game’s multi-player best response correspondence.

Improvement Deviation for Prisoner 2


1

1

1515

0

0

10

10

D C

C

D

12

Improvement Path



1

1

1515

0

0

10

10

D C

C

D

12

1

1

1515

0

0

10

10

D C

C

D

12

Improvement Path

Figure 8 Prisoner’s Dilemma Matrix Form Representation

Another way this game can be analyzed is by following preferable deviations from each action tuple. When the improvement path encounters an action tuple from which there is no preferable deviation, that action tuple can be identified as a Nash equilibrium. Preferable deviations are also illustrated in Figure 8. A sequence of preferable deviations is known as an improvement path. Notice that in this game all conceivable improvement paths are finite. When this occurs, the model predicts that there will not be oscillation between any sequence of action tuples. It is also interesting to note that the Prisoner’s Dilemma exhibits bilateral symmetric interaction wherein if the players swap actions, the results are exactly reversed. Both of these are characteristics of a special class of games known as potential games.

3.2 Key Game Forms This section describes some of the game forms that are critical to the application of game theory to RRM.

3.2.1 Repeated games A repeated game is sequence of stages where each stage is the same normal form game. When the game has an infinite number of stages, the game is said to have an infinite horizon game. Based on their knowledge of the game – past actions, future expectations, and current observations - players choose strategies – a choice of actions at each stage. These strategies can be fixed, contingent on the actions of other players, or adaptive. Further, these strategies can be designed to punish players who deviate from agreed upon behavior. When punishment occurs, players choose their actions to minimize the payoff of the offending player. However, even when the other players are minimizing the payoff

a player i, i is still able to achieve some payoff vi. Thus there is a limit to the how much a player can be punished. As estimations of future values of ui are uncertain, many repeated games modify the original objective functions by discounting the expected payoffs in future stages by δ, where 0 < δ < 1 such that the anticipated value in stage k to player i is given by (18).

( ) ( ),k

i k iu a u aδ= (18) Folk theorem [41] In a repeated game with an infinite horizon and discounting, for every feasible payoff vector v > vi for all i∈N, there exists a δ < 1 such that for all δ∈(δ, 1) there is a NE with payoffs v. To generalize the Folk theorem, given a discounted infinite horizon repeated game, nearly any behavior can be designed to be the steady-state through the proper choice of punishment strategies and δ. Thus convergent behavior of a repeated game can be achieved nearly independent of the objective function.

3.2.2 Myopic games A myopic game is defined here as a repeated game in which there is no communication between the players, memory of past events, or speculation of future events. Any adaptation by a player can still be based on knowledge of the current state of the game. As players have no consideration of future payoffs, the Folk theorem does not hold for myopic games and the convergence to steady-state behavior must occur through other means. Two convergence dynamics possible in a myopic game are the best response dynamic and the better response dynamic. Both dynamics require additional structure in the stage game to ensure convergence. Definition: Best response dynamic [43] At each stage, one player i∈N is permitted to deviate from ai to some randomly selected action bi ∈ Ai iff ( ) ( ), ,i i i i i i i i iu b a u c a c b A− −≥ ∀ ≠ ∈ and ( ) ( ),i i i iu b a u a− > . Definition: Better response dynamic [43] At each stage, one player i∈N is permitted to deviate from ai to some randomly selected action bi ∈ Ai iff ( ) ( ), ,i i i i i iu b a u a a− −> .

3.2.3 S-modular games An S-modular game restricts uj such that for all i∈N either (19) or

(20) is satisfied.

( )

( )

2

2

0

0

i

i j

i

i j

u aj i N

a a

u aj i N

a a

∂≥ ∀ ≠ ∈

∂ ∂

∂≤ ∀ ≠ ∈

∂ ∂

(19)

(20)

When (19) is satisfied, the game is said to be supermodular; when

(20) is satisfied, the game is said to be submodular. Myopic games whose stages are S-modular games with a unique NE and follow a best response dynamic converge to the NE when the NE is unique [39].

3.2.4 Potential games A potential game is a special type of normal form game where uj are such that the change in value seen by a unilaterally deviating player is reflected in a function P: A→ℜ. All myopic games where the stages are the same potential game converge to a NE when decisions are updated according to a better response dynamic [43]. Exact Potential Games A game is an exact potential game (EPG) if there exists some function (EPF) P: A→ℜ such that (21) is satisfied ∀i∈N, a A∀ ∈ .

( ) ( ) ( ) ( ), , , ,i i i i i i i i i iu a a u b a P a a P b a− − − −− = − (21) In [42], (22) is given as a necessary and sufficient condition for a game to be an exact potential game.

( ) ( )22

, ,ji

i j j i

u au ai j N a A

a a a a

∂∂= ∀ ∈ ∈

∂ ∂ ∂ ∂ (22)

Ordinal Potential Games A game is an ordinal potential game (OPG) if there exists some function (OPF) P: A→ℜ such that (23) is satisfied ∀i∈N, ∀a∈A. Note that all EPG are also OPG.

( ) ( ) ( ) ( ), , , ,i i i i i i i i i iu a a u b a P a a P b a− − − −> ⇔ > (23) While there is no necessary and sufficient condition for establishing that a game is an OPG, it is possible to indirectly show that a game is an OPG by applying ordinal transformations that modify the cardinal mapping of the game’s utility functions while preserving the ordinal preference relationships. Key Potential Game Properties Myopic games with potential game stages converge to a steady state when following a better response dynamic. Further, potential games are guaranteed to possess at least one Nash equilibrium. These equilibriums can be quickly found by identifying the maximizers of the games potential function.

4 Application of Game Theory to Radio Resource Management

Distributed adaptive behavior in a wireless network will generally lead to recursive behavior wherein the decisions of one radio will subsequently influence the decisions of other radios in the network. In order to successfully deploy these networks, it will be necessary to determine if the network will eventually reach a steady state. If the adaptive behavior does reach a network steady state, resources can be appropriately allocated and performance anticipated; otherwise these tasks are virtually impossible. With a game theoretic analysis, these network steady states can be identified from the Nash Equilibriums (NE) of its associated game. It is important to note that not every game, and thus not every adaptive behavior, will have a steady state. Also not every steady-state is desirable as in some situations the radios may be jamming each other or the network might achieve a significantly less than optimal performance. The exact steady-states that a network reaches are a function of the specifically implemented adaptive behaviors and the convergence mechanisms used by the network. Thus as we discussed in [27], the following are the key considerations for the successful implementation of a distributed adaptive algorithm in a wireless network:

• Steady-state existence • Optimality of steady-state • Convergence • Stability (will small variations lead to undesirable behavior?) • Scalability

There are a number of different convergence mechanisms that a network might employ. These mechanisms range from fully centralized to fully distributed to mixes in between. A fully distributed network will generally be easily scalable and of low complexity to implement for each node whereas a centralized network will induce high complexity on at least one node and may have scalability limitations. Thus as long as the distributed network yields the appropriate steady-state, it will generally be desirable. However, a fully centralized network will “converge” to whatever behavior is dictated by the network authority while the convergence of distributed algorithms is dependent on the improvement paths in the associated game’s action space. Consider the action space for the two player game shown in Figure 9. Here there are three NE (steady-states). NE1 and NE2 are both “stable” steady states as small perturbations still converge to the NE. NE3, however, is an unstable steady state as all improvement paths lead away from it. Thus with a distributed algorithm the network would rarely be in NE3 even if NE3 was the most desirable steady state. Additionally, this action space includes areas where adaptive cycles can occur so behavior will be bounded, though never stable. Depending on the signaling required to alter the behavior, these cycles could result in inefficient spectrum utilization. It will be important to select distributed algorithms where the steady-states are stable and cycles are absent. S-modular games [39] and potential games [28] are both examples of game models that can be applied to wireless networks where these conditions are satisfied.

a1

a2

NE1

NE2

NE3

a1

a2

NE1

NE2

NE3

Figure 9 Improvement Paths in a 2-Player Game

a1

a2

NE1

NE2

NE3

a1

a2

NE1

NE2

NE3

Figure 10 NE Sensitivity

Network robustness is also critical for adaptive SDR behavior. In a mobile environment, channel conditions are constantly changing; the impact that a specific action has on the network will accordingly also be constantly changing. It will be important to know how much these parameter variations influence the steady-state behavior. As shown in Figure 10, these variations could have the effect of smearing the steady states. Further these parameter variations can also alter the network’s improvement paths potentially damaging the convergence properties. Just as changes in channel conditions can influence node behavior, changes in the number of nodes in the network can also influence behavior. It will be important to characterize how increasing the number of adaptive SDRs in a network influences network performance. Due to the interactive nature of these distributed algorithms, it is anticipated that game theory can serve as a good tool for analyzing these networks. Beyond these generalities, this section of the report considers specific rules for applying game theory to communications networks, describes a framework suitable for modeling physical layer games, and gives several example applications that illustrate the various kinds of information that game theory can provide.

4.1 Rules for Applying Game Theory to Communications Networks

In [28] we formulated the following set of necessary conditions for modeling a wireless network as a game. The first set of conditions ensures that rationality assumptions hold. Conditions for Rationality 1. The decision-making process must be well-defined, i.e., each of the radios must

follow a well-defined and deterministic set of rules for selecting an action with respect to environmental factors.

2. A decision maker’s choice to change an action should have a reasonable expectation to result in a positive improvement deviation.

These conditions may be satisfied by requiring each radio to maximize a well-defined objective function or by implementing simple state machines that determine the radio’s operating parameters in response to changes in its environment – an example of a myopic

decision maker. This single condition effectively combines the need for definable action sets and objective functions, both requirements for effective analyses of games. Conditions for a Nontrivial Game 1. There must be more than one decision making entity in the network. 2. More than one decision maker has a nonsingleton action set. If the first condition is not satisfied, as it frequently is not in wireless networks, then the system may be an interesting optimization problem, but it is not a game as the decision making process is not interactive. For instance, the round robin scheduling of [31] makes for an interesting optimization problem, but is not a game as only one decision making as the base station is the sole decision maker. The second condition formalizes the first condition as a decision maker without a choice of actions is not in fact a decision maker.

4.1.1 Example Applications To illustrate when a wireless network satisfies these conditions, the following lists both instances in which it is appropriate to model the network as a game and instances in which it is inappropriate to model the network as a game.

4.1.1.1 Example Inappropriate Applications 1) Site Planning During site planning, significant studies are performed so that site location, channel assignment, and reuse factors are chosen in such a way as to maximize coverage, capacity, and user QoS, while minimizing deployment cost (typically reflected in the number and complexity of the base stations). Indeed this is a complicated RRM issue due to the intricate considerations that must be made and the inverse relations between the first three objectives and the final objective. However, site planning is not a game as there is only a single decision maker. In this scenario there is indeed clear decision making objectives, and the decision maker (site planner) can be presumed to choose from its available actions (site locations, channel assignments, and reuse factors) in a way that maximizes his objectives. However, only the site planner makes decisions; thus there is no interaction and the nontrivial conditions are not satisfied. Note that the fact that the decision making process is not dynamic is not a problem. For instance, recall that a normal form game is also not dynamic. However, in these single shot scenarios convergence will not be an issue. 2) Most Random Access Schemes In a random access scheme, there are clearly numerous decision making radios, the decisions are interactive, and there can be clearly defined objectives (minimize BER), however random access schemes are not generally games as the choice of action is not deterministic (it may or may not have a reasonable expectation to improve performance).

For instance, consider a slotted ALOHA scheme where the retransmission probability is fixed (presumably at1/ N , the optimal retransmit probability). While there are a number of different slots that each radio can choose to retransmit in, the choice is entirely random, thus not deterministic. Thus rationality condition 1 is violated. Specifically, without any expectation as to how the radio will act, the game theorist cannot draw any conclusions on the behavior of the game. It should be noted that if the radios are free to adapt their retransmit probability (making the choice of retransmit probability the action), the ALOHA network can indeed be modeled as a game as was done in [30]. Note that although the decision making process in this case can be modeled as occurring at the MAC layer. 3) Centralized Power Control Schemes Whether downlink or uplink, if the power control decisions are centralized in a single decision making entity (for instance at a base station), a centralized power control scheme cannot be modeled as a game as it only has a single decision maker. Even in mobile assisted schemes, centralized power control is not a game (unless the mobiles have the option of “lying” about their performance). In these cases the mobiles merely serve as an extension to the information collection functions of the base station.

4.1.1.2 Example Appropriate Applications 1) Distributed Power Control As just addressed, game theory is not universally applicable to power control algorithms. However, when the decisions are not centralized, i.e., distributed, game theory can be appropriately applied. Scenarios where this is appropriate include both open loop power control, closed loop power control in an ad-hoc network, and cellular closed loop power control where the decision makers are the base station–mobile pairs. In this last scenario, in practical cellular system, channels used in one cell are replicated in some other cell in the system. Presuming path loss exponents are not unusually large, e.g., greater than four, there will be interference between the cells and the power level decisions of one cell will impact the decisions of other cells in the network as illustrated in Figure 11. Indeed, we applied game theory to a related scenario in [32] wherein power levels and rate choices in one General Packet Radio Service (GPRS) cell impacted other cells with the same channel. As each base station made its decisions independently of the remaining base stations, this scenario was indeed a game.

1

2

3

4

5

6

7

d1

Figure 11 Decision Interaction with a 7-cell Reuse Plan [32]

A closely related phenomenon to this inter-cell interaction is cell breathing. In cell breathing, a base station controls the loading of its cell by modifying its pilot beacon signal strength. When a cell’s load is above a certain threshold, the base station reduces its pilot beacon signal; when it is below the threshold, the pilot signal is increased. This then influences the handover operations in the system (which along with some hysteresis effects are primarily influenced by which base station provides the strongest signal) moving some of the more distant mobiles between cells. Note that unless a mobile is dropped, this pilot beacon signal modification will change the number of mobiles that are carried in adjacent cells. Thus an interactive decision process exists, and game theory can be applied. 2) Distributed Dynamic Channel Assignment In a dynamic channel assignment scheme, the channels assigned for use in each cell or cluster are dynamically altered based on perceived environmental factors. Thus a cell instructs a link to switch to a different channel in response to interference fluctuations. Of course the channels chosen in one cell impact the channel assignment decisions in nearby cells – thus an interactive decision process occurs. Examples of distributed dynamic channel assignment schemes are given in [33] and [34]. Though not previously addressed, this scenario should be easily modeled as a congestion game. A closely related concept is adaptive interference avoidance, wherein each link adapts its waveform in response to perceived interference. Within a cluster network, the scenario can be handled as a distributed dynamic channel assignment problem if one treats the various adaptations as the set of (nonorthogonal) “channels” available for use. Many more distributed radio resource management algorithms can be appropriately modeled as a game. For instance, MAC adaptations, such as the ALOHA MAC strategy game [30], and ad-hoc network formation with localized optimization can also be appropriately modeled as a game. However, in each of these scenarios, there are multiple decision makers, the process is interactive, and the objectives and actions are clearly defined.

4.1.2 Cognitive Radio The functions implied by these conditions are not generally satisfied by the typical radio or even the typical software radio. While necessary, the ability to alter its waveform alone does not satisfy the outlined conditions (the adaptations could still be directed from a single centralized authority). For game theory to apply, a radio must also have the ability to sense its performance (minimal environmental awareness) and be able to autonomously alter its waveform, or more formally, radios will need to support the following processes:

• Observation Process • Observation Valuation • Decision Updating Process

These mechanisms may have to be inferred or may be explicitly implemented as in a cognitive radio. As formally introduced in [35], cognitive radio is an enhancement on traditional software radio design that attempts to establish a framework that a radio can evaluate its capabilities, the requirements of its services, its potential waveforms, and the environment to then decide and act in a way that, to the limit of its knowledge, best satisfies the needs of the situation. As illustrated below in Figure 12, cognitive radios employ a cognition cycle to alter their actions in response to changes in the environment through the use of state machines. The cognition cycle may perform a detailed analysis that predicts future changes in the environment or may make simple adjustments in immediate response to environmental changes.

Figure 12 Cognition Cycle [36]

While this cycle is quite complex – in fact it implies the existence of artificial intelligence – it is possible to create simpler versions of cognitive radio that reduce the complexity while maintaining some key functionalities. Indeed, [35] identifies the nine levels of cognitive radio functionality summarized in Table 1. Loosely both the capabilities and complexity implied by each level in Table 1 grows exponentially as the level increases and goes far beyond the suggested mechanisms.

Table 1 Levels of Cognitive Radio Functionality (Adapted from [35] Table 4-1)

Proposes and Negotiates New ProtocolsAdapts Protocols8

Generates New GoalsAdapts Plans7

Autonomously Determines Structure of Environment

Learns Environment6

Settle on a Plan with Another RadioConducts Negotiations5

Analyze Situation (Level 2& 3) to Determine Goals (QoS, power), Follows Prescribed Plans

Capable of Planning4

Knowledge of Radio and Network Components, Environment ModelsRadio Aware3

Knowledge of What the User is Trying to DoContext Awareness2

Chooses Waveform According to Goal. Requires Environment Awareness.Goal Driven1

A software radioPre-programmed0

CommentsCapabilityLevel

Proposes and Negotiates New ProtocolsAdapts Protocols8

Generates New GoalsAdapts Plans7

Autonomously Determines Structure of Environment

Learns Environment6

Settle on a Plan with Another RadioConducts Negotiations5

Analyze Situation (Level 2& 3) to Determine Goals (QoS, power), Follows Prescribed Plans

Capable of Planning4

Knowledge of Radio and Network Components, Environment ModelsRadio Aware3

Knowledge of What the User is Trying to DoContext Awareness2

Chooses Waveform According to Goal. Requires Environment Awareness.Goal Driven1

A software radioPre-programmed0

CommentsCapabilityLevel

Note that a level 3 cognitive radio supports all of the minimal processes implied by the application of game theory. Based on the functionalities of a level 3 cognitive radio, a reduced cognition cycle can be constructed as shown in Figure 13. Thus while the cognitive radio concept is closely related to the processes that go on in an interactive decision making network, many of the complexities of cognitive radio need not be introduced for the application of game theory to be appropriate.

Figure 13 Level 3 Cognitive Radio Cognition Cycle

4.2 A general framework for modeling physical layer behavior as a game

Loosely Figure 14 illustrates how a network can be modeled as a game. The decision making nodes in the network form the player set of the game, each node’s available power levels (or other adaptations) form the action sets of the players, and the algorithms used by the nodes to modify their behavior form the utility functions and learning processes within the game.

Figure 14 Network

Based on previous models described in [28] and [37], we introduced a more specific game model for distributed physical layer adaptations in [38]. This model is based on the following key assumptions :

• Fundamentally, the choice of a power level and the selection of an appropriate signaling waveform are the adaptations that may be adopted at the physical layer by a node.

• From a physical layer perspective, performance is generally a function of the effective signal-to- interference-plus-noise ratio (SINR) at the node(s) of interest.

• Effective SINR is a function of two choices by a node: the transmit power level and the signaling waveform (modulation, frequency, and bandwidth). The exact structure of this function is also impacted by a variety of factors not directly controllable at the physical layer; the most notable of these factors are environmental path losses and the processing capabilities of the node(s) of interest.

• When the nodes in a network respond to changes in perceived SINR by adapting their signal to SINR changes, a physical layer interactive decision making process occurs.

Based on these assumptions, a game theoretic model for physical layer adaptations can be formed using the parameters listed in Table 1. Table 1. Model Parameters

Symbol Meaning Symbol Meaning

N The set of decision making nodes in the network P

The power space (ℜn) formed from the Cartesian product of all Pj.

nxPxxPPP .........21=

hi,j

The link gain from i to j. Note this may be a function of the waveform.

p A power profile (vector) from P formed as

nppp ........, 21=p

jΩ The set of waveforms known by node j H The network link gain matrix.

ωj A waveform chosen by j from jΩ

1,2 2,2 1,

2,1

3,1

,1 ,2

11

1

n

n n

h h hhhH

h h

=

L

O MM O

L L

O

The waveform space formed from the Cartesian product of all Ωj.

jNjx Ω= ∈O

Pj

The set of power levels available to node j. This is presumed to be a subset of the real number line.

? A waveform profile (vector) from Ω formed as

1 2, , , nω ω ω ω= …

pj A power level chosen by j from Pj uj(p, ? ,

H,)

The utility determined by j.

From this table, the stage game for interactive physical layer adaptations can be modeled as with the game components listed in

( ) , , , ,j j jG N P x u ω= Ω p H (24)

For a general physical layer adaptation, each node, j, selects a power level, pj, and a waveform, ωj, based on its current observation valuation, decision making process and learning process. Restricted versions of this game are commonly encountered. Distributed power control systems permit each radio to select pj, but restrict Ωj to a singleton set; distributed waveform adaptation systems (adaptive interference avoidance) restrict the choice of pj, but allow ωj to be chosen by the physical layer. In this model, “waveform” is used in a very abstract sense. A waveform is all of the components of a transmitted signal other than the power level. The transmitted waveforms can have significant impact on how the power levels chosen by two different nodes. Consider two nodes transmitting orthogonal waveforms and then consider two nodes transmitting on the same channel. Waveform interaction is ultimately influenced by the correlation between waveforms, whether direct (choice of frequencies), or statistical (choice of long spreading sequences). Note that in a pure power control game, while a node has no control over the choice of waveform, performance and the exact interactions will be strongly influenced by the nodes’ waveforms. As an interesting aside, a distributed power control algorithm where all nodes’ waveforms are orthogonal is not a game as there is no interaction and each power control decision can be more appropriately treated as a set of independent optimization problems. The utility functions in this model may be explicitly implemented or from the algorithm’s preferences. For instance, consider an algorithm that increments transmit power when below a target BER and decrements transmit power when above a target BER. From this, an ordered set of preference relations can be constructed as follows: greatest preference given to the power vector that yields the target BER, decreasing preference given as a function of distance from BER (distance could be measured in different ways). Real numbers can then be assigned in any manner that preserves this ordinal relationship.

A fundamental assumption is that each node in the network has one or more nodes of interest where performance is measured. Fundamentally, this excludes scenarios where performance is measured at locations that do not coincide with a node. As the considered application is to communications algorithms, this does not appear to be a significant restriction. These nodes of interest may or may not also be players. As illustrated in Figure 15, there are three fundamental nodes of interest scenarios: networks where there are a single node of interest, networks where the nodes of interest are partitioned, and networks where the nodes of interest are unpartitioned. In a single node of interest scenario every node measures performance at the same point. Examples of a single node of interest scenario include single cluster networks and situations where external interference is negligible or external interferers are not adaptive. In a partitioned nodes of interest scenario, each node has a single node of interest, but different groups of nodes have different nodes of interest. Examples of this scenario include multiple clusters

Single Node of Interest

Partitioned Nodes of Interest

Unpartitioned Nodes of Interest

Figure 15 Nodes of Interest Scenarios

The link gains considered in this model can include path loss and antenna gain (think link budget) and are intended to model all the power lost from signal transmission to signal reception. Also as the model treats the path losses as scalars, this link gain model is most appropriate for narrow band signals.

4.3 Example Applications The following are some example applications of game theory to radio resource management. The application of different game models allows different kinds of information to be extracted about the behavior of the network.

4.3.1 Supermodular power control [39] examines the application of super-modular games to distributed power control in a cellular network. Recall that in supermodular best response dynamics converge to a NE, and NE exist if the action spaces are compact and the utility functions are upper-semi-continuous. [39] introduces a General Updating Algorithm (GUA) which is actually a best response dynamic and exploits the properties of supermodular games to demonstrate convergence. Thus if these network games have a unique NE, behavior converges to the NE from any initial p. [39] also asserts that the classes of power control algorithms considered in by Yates in Error! Reference source not found.] are S-modular games, thus providing another avenue for demonstrating convergence.

4.3.2 Separable SINR Games

In [37] we considered a subclass of the game described above wherein each radio can separate its SINR function into a function of received signal strength less a function of interference less some individual function that models the cost of a power level or waveform (such as a battery life penalty) so that the radio’s objective function takes the following form:

( ) ( )

( ),1 , ,2 ,\ ,

, , ,i i i

i

i i i i i i j j ij i i ij N i

u a f p f p N c pν ν νν

γ ω γ ρ ω∈

= − + −

∑ .

This is a widely applicable model as many adaptive modulation schemes perform their adaptation based on SINR estimates, and can be readily put in this form by working with SINR estimates in dB or by directly subtracting interference from noise. Also note that the nature of the functions of Received Signal Strength (RSS) and interference can be defined in arbitrary ways. [28] considers the following two additional refinements to this model. Each of these scenarios, are potential games. This knowledge provides additional valuable information such as a steady state exists and myopic better responses converge to a steady state in a repeated game.

4.3.3 SINR Power Control Games This game limits each radio to only adjusting its power level in response to changes in SINR at its point of interest. In this model, the waveform selected at link initialization remains fixed. Every radio’s performance is impacted by interference and chooses an action to change its power level in response to SINR and maintains some minimum threshold, ε i. Additionally, suppose that each radio has some cost function associated with each power level, ei. In this case each radio’s objective function can be written as

( ) ( )

( ),1 , ,2\ ,

,i i

i

i i i i i j j ij i i ij N i

u f p f p N c pν νν

γ γ ρ ε∈

= − + − −

∑p .

This game can be verified to have a potential as 22

0, , ,ji

i j i j

uu i j N i jp p p p

∂∂ = = ∀ ∈ ≠∂ ∂ ∂ ∂

By recognizing that fi,2 is a dummy function, a potential function can be written as

( ) ( ) ( ),1 ,,ii i i i i

i N

P f p c pνγ∈

= − ∑p

4.3.4 SINR Waveform Games In this model, let us assume now that the radios are a part of a power controlled star network such that the energy received at the sole access point is the same for each radio. Thus each radio has the same node of interest and only maintains a single link. Adaptive modulation is employed and each radio may select any waveform from its waveform set Ωi. Again we assume some cost associated with the use of each waveform. The objective function for each radio now takes the form

( ) ( ) ( ),1 2\ ,

i

i

i i i i j i i ij N i

u f f N cνν

ω ω ωω ε ω∈

= − + − −

∑ ∫ .

Here we must constrain f2 to be linear. By recognizing that f2 is the sum of BSI terms, an exact potential function can be written as

( ) ( ) ( ) ( )1

2 ,11, i

i

i j i i i ii N j j i N

P f c fν

ω ω ω ω ω−

∈ = ≠ ∈

= − − ∑ ∑ ∑∫ .

4.3.5 Convergence of Cognitive Radio Networks

In [40] we examine three convergence dynamics - coordinated behavior, best-response, and better response – and their impact on network complexity. We saw that these dynamics are supported by discounted repeated games, S-modular games, and potential games, respectively. We also formally introduce the game theoretic model suitable for analyzing the behavior of a CRN described in Section 4.2. We also showed how the ability of a CRN to support these dynamics is influenced by the cognitive radios’ objective functions. We then saw how the objective functions impact the complexity of the CRN by examining several different distributed power control algorithms as summarized in Table 2. Finally, we showed that relatively minor changes to a CRN’s objective functions, such as a modification to the pricing function, can prevent the application of better-response based adaptation algorithm.

Table 2 Surveyed algorithms' CRN complexity Algorithm Game Model Complexity

Repeated Game Repeated High S-modular S-modular Low

Target SINR OPG Minimal Target Throughput OPG Minimal

Linear Pricing S-Modular Low Nonlinear Group Pricing Repeated High

4.3.6 Steady State Modification In [28] we also addressed techniques for modifying the steady state of a network. While exact potential games are guaranteed to converge to a Potential Maximizing Nash Equilibrum, this state may not be desirable. Fortunately, for exact potential games, it is a relatively straight forward task to move the steady-state of the game, a* to some other desired valid state, a**. The procedure for doing so is as follows. Introduce a network cost function, NC(a). Solve the following equation for NC(a)

( ) ( )** **

0i i

P a NC a

a a

∂ ∂+ =

∂ ∂

where P(a) is the game’s potential function. In other words, at the desired action tuple, the network cost function should have the negative slope of the potential function’s slope. After solving for NC(a), “charge” the nodes this cost function. This can either be done as part of the node’s development or as a cost imposed by the network on the nodes. Thus the objective function for each node takes the following form

( ) ( )

( ) ( ),1 , ,2 ,

\ ,

, ,

,

i i i

i

i i i i i i j j ijj N i

i i i

u a f p f p N

c p NC a

ν ν νν

γ ω γ ρ

ω∈

= − +

− +

∑

The potential function for this modified game is given by ( ) ( ) ( )'P a P a NC a= + for which a** is clearly a Nash Equilibrium. Note that this process may introduce additional Nash Equilibria depending on the exact topology of P(e) and the choice of NC(a). Care should also be taken so that the original Nash Equilibrium is no longer a Nash Equilibrium. To minimize the creation of new Nash Equilibria, it is suggested that the function be of low order or a piecewise function. Also note that it is possible to impose arbitrary cost functions to create arbitrary potential functions. However, this will not in general be desirable, as significant alterations in the potential function will result in significant changes to the behavior of the network, perhaps negating the original advantages of the adaptation scheme. Further, while this solution is deterministic, the actual channel conditions will be stochastic and the stability of the Nash Equilibria should also be considered. Thus in general, it is anticipated that small changes in the neighborhood of the original Nash Equilibrium will be more desirable than more significant alterations to the game.

5 Summary This report has examined issues associated with radio resource management, game theory and the application of game theory to radio resource management. The report considers categorized radio resource management into fixed and dynamic resource management. Within dynamic resource management, resource allocation can occur within a centralized or distributed framework. In general it was seen that dynamic resource allocation will generally outperform fixed resource allocation schemes. However dynamic resource allocation schemes also generally incur an overhead penalty. Distributed dynamic resource allocations have the potential to provide performance ga ins with reduced overhead, but introduce a potentially problematic interactive decision process. An extended look was provided on power control algorithms which are implemented as centralized, distributed, and hybrid versions. Some basic concepts from game theory were then introduced and defined. The notion of a game was defined and a simple example was analyzed. Then some relevant game models were introduced and their properties discussed. Finally, the application of game theory to radio resource management was discussed. The report highlights that game theory is not always applicable to radio resource management issues, but can be applied to distributed dynamic radio resource management applications. A formal set of rules for applying game theory to wireless applications was introduced, and we saw how a network of Level 3 cognitive radios can provide the requisite capabilities for dynamic distributed decision making. We then introduced a formal model for applying game theory to physical layer adaptations. This model was used to elucidate some applications of game theory to radio resource management and to demonstrate some of the information yielded by a game theory analysis of a radio resource management scheme.

6 Research Directions Broadly, my envisioned dissertation will consider the application of game theory to RRM. Specifically, it will address the application of game theory to distributed power control algorithms. Algorithms will be drawn from both cellular networks and from ad-hoc networks. Specific issues to be addressed include the following:

• Steady-state solutions for common ad-hoc and distributed cellular power control algorithms

• Convergence criteria for these algorithms • Determination of how scaling impacts solutions • Determination of how stochastic channel models impact solutions • Development of techniques to improve steady-state performance • Power control algorithms that incorporate rate adaptation will be considered.

Many of the studied algorithms will incorporate different QoS requirements at different nodes. Though perhaps not considered in the dissertation, I also expect to research adaptive interference avoidance situations, network formation algorithms, and combinations of these algorithms with power control algorithms. Though not based on game theory, as part of my ad-hoc power control study, I hope to establish a necessary criterion for differentiated QoS algorithms in ad-hoc networks similar to the one given in (2). In addition to the dissertation, it is expected that this research will produce five or six journal papers and nine or ten conference papers. As highlighted throughout this report, some of this research has already been performed and some of these papers have already been written. It should also be noted that this research has already been leveraged to produce a short course.

References [1] T. Cover and J. Thomas, Elements of Information Theory, John Wiley & Sons, Inc, New York City, 1991.

[2] Cameron, Rick and Brian Woerner. “Performance Analysis of CDMA with Imperfect Power Control,” IEEE Transactions on Communications, Vol 44, July 1996, pp. 777-781.

[3] K. Gilhousen, I. Jacobs, R. Padovani, A. J. Viterbi, L. Weaver, C. Wheatley III, “On the Capacity of a Cellular CDMA System” IEEE Transactions on Vehicular Technology, vol. 40, May 1991, pp. 303 – 312.

[4] Bluetooth Core Specification v1.2 https://www.bluetooth.org/spec/

[5] R. Yates, “Uplink Power Control in Cellular Radio Systems,” IEEE Journal on Selected Areas in Communications, Vol. 13, No 7, September 1995, pp. 1341-1347.

[6] G. Foschini and Z. Miljanic. “A Simple Distributed Autonomous Power Control Algorithm and its Convergence,” IEEE Transactions on Vehicular Technology, Vol. 42, No. 4, November 1993, pp. 641-646.

[7] S. Grandhi, R. Vijayan, D. Goodman, “Distributed Power Control in Cellular Radio Systems,” IEEE Transactions on Communications, Vol. 42, 1994, pp. 226-228.

[8] P. Kumar, R. Yates, J. Holtzman. “Power Control Based on Bit Error Rate (BER) Measurements,” Military Communications Conference, 1995, pp. 617-620.

[9] C. Chuah and R. Yates. “Evaluation of a Minimum Power Handoff Algorithm,” Personal, Indoor and Mobile Radio Communications, 1995. pp. 814-818.

[10] Jeffrey H. Reed, Software Radio: A Modern Approach to Radio Engineering, Prentice Hall, 2001.

[11] J. Zander, S. Kim, Radio Resource Management for Wireless Networks, Artech House Publishers, Boston, 2001.

[12] J. Laiho, A. Wacker, T. Novosad, Radio Network Planning and Optimisation for UMTS John Wiley & Sons LTD, Chichester, England, 2002.

[13] V.Shah, N. Mandayam, and D. Goodman, “Power Control for Wireless Data Based on Utility and Pricing,” Symposium on Personal, Indoor and Mobile Radio Communications, 1998. pp. 1427-1432.

[14] Saraydar, Cem U., Narayan B. Mandayam, David J. Goodman, “Pareto Efficiency of Pricing-based Power Control in Wireless Data Networks,” Wireless Communications and Networking Conference, 1999, pp 231-235.

[15] D. Goodman and N. Mandayam. “Power Control for Wireless Data,” IEEE Personal Communications, April, 2000, IEEE Personal Communications, pp. 48 – 54.

[16] Saraydar, Cem U., Narayan B. Mandayam, David J. Goodman, “Power Control in a Multicell CDMA Data System Using Pricing,” Vehicular Technology Conference, Fall 2000. pp. 484-491.

[17] Saraydar, Cem U., Narayan B. Mandayam, David J. Goodman, “Pricing and Power Control in a Multicell CDMA Data System” IEEE Journal on Selected Areas in Communications Vol 19, No 10, October 2001.

[18] Goodman, David and Narayan Mandayam, “Network Assisted Power Control for Wireless Data,” Vehicular Technology Conference, Spring 2001 pp. 1022-1026.

[19] T. J. Kwon, M. Gerla, “Clustering with Power Control” Military Communications Conference, 1999, pp. 1424-1428.

[20] C. Sung and W. Wong, “A Noncooperative Power Control Game for Multirate CDMA Data Networks,” IEEE Transactions on Wireless Communications, Vol. 2, No 1. January 2003, pp. 186-19.

[21] J. P. Monks, J. P. Ebert, A. Wolisz, and W. W. Hwu “A study of the energy saving and capacity improvement potential of power control in multi-hop wireless networks” LCN 2001. pp 550-559.

[22] S. Agarwal, S. Krishnamurthy, R. Katz, S. Dao Distributed power control in ad-hoc wireless networks,” International Symposium Personal, Indoor and Mobile Radio Communications, 2001, pp. F-59-F-66.

[23] X. Lin,Y. Kwok, and V. Lau, “Power control for IEEE 802.11 ad hoc networks: issues and a new algorithm” International Conference on Parallel Processing, 2003.

[24] J. P. Monks, V. Bharghavan, W. W. Hwu, Transmission power control for multiple access wireless packet networks - Local Computer Networks, 2000. LCN 2000. Proceedings. 25th Annual IEEE Conference on, pp 12-21.

[25] M. Krunz and A. Muqattash, “A power control scheme for MANETs with improved.throughput and energy consumption,” Wireless Personal Multimedia Communications, 2002, pp. 771-775.

[26] Z. Huang C. Shen, C. Srisathapornphat, and C. Jaikaeo, “Topology control for ad hoc networks with directional antennas” Computer Communications and Networks, 2002. pp16-21.

[27] J. Neel, P. Robert, A. Hebbar, R. Chembil, J. Reed, S. Srikanteswara, R. Menon, R. Kumar, S. Sayed “Critical Technology Challenges to the Commercialization of Software Radio,” World Wireless Research Forum 10 2003.

[28] J. Neel, J. Reed, R. Gilles, “Game Theoretic Analysis of a Network of Software Radios,” SDR Forum Conference 2002.

[29] Viterbi, A. M., A. J. Viterbi, and E. Zehavi, “Performance of Power-Controlled Wideband Terrestrial Digital Communications,” IEEE Transaction on Communications, vol. 41, April 1993, pp. 559-569.

[30] A. MacKenzie and S.Wicker, “Selfish users in Aloha: A Game Theoretic Approach,” Vehicular Technology Conference, Fall 2001, Vol.3, pp 1354-1357.

[31] H. Fattah, C. Leung, “An efficient scheduling algorithm for packet cellular networks,” Vehicular Technology Conference, Fall 2002, vol. 4, pp. 2419-2423.

[32] S. Ginde, R. M. Buehrer, and J. Neel, “A Game Theoretic Analysis of the GPRS Adaptive Modulation Schemes,” Vehicular Technology Conference 2003 – Fall, October 6-9, 2003.

[33] D. Goodman, S. Grandhi, and R. Vijayan, “Distributed Dynamic Channel Assignment Schemes,” Vehicular Technology Conference, 1993, pp. 532-535.

[34] A. Lozano and D. Cox, “Distributed dynamic channel assignment in TDMA mobile communication systems,” IEEE Transactions on Vehicular Technology, vol. 51, November 2002, pp. 1397-1406.

[35] J. Mitola, III “Cognitive Radio: An Integrated Agent Architecture for Software Defined Radio,” PhD Dissertation Royal Institute of Technology, Sweden, May 2000.

[36] J. Mitola, III “Cognitive Radio for Flexible Multimedia Communications”, Mobile Multimedia Communications 1999, pp. 3 –10, 1999.

[37] J. Neel, R. M. Buehrer, J. Reed, and R. Gilles, “Game Theoretic Analysis of a Network of Cognitive Radios,” Midwest Symposium on Circuits and Systems 2002, Aug 4-7, 2002.

[38] V. Srivastava, J. Neel, J.Hicks, A. MacKenzie, K. Lau, L. DaSilva, J. Reed, R. Gilles, “Application of Game Theory to Distributed MANET Algorithms” (In Progress)

[39] E. Altman and Z. Altman. “S-Modular Games and Power Control in Wireless Networks” IEEE Transactions on Automatic Control, Vol. 48, May 2003, 839-842.

[40] J. Neel, J. Reed, R. Gilles, “Convergence of Cognitive Radio Networks,” Accepted to WCNC 2004.

[41] D. Fudenberg and J. Tirole, Game Theory, MIT Press, 1991.

[42] D. Monderer, and L. Shapley, "Potential Games," Games and Economic Behavior14, pp. 124-143, 1996.

[43] J. Friedman and C. Mezzetti, “Learning in Games by Random Sampling” Journal of Economic Theory, vol. 98, May 2001, pp. 55-84.

how does game theory apply to radio resource management?read.pudn.com/downloads137/doc/583959/how...

Documents