[ieee 2012 ieee global high tech congress on electronics (ghtce) - shenzhen, china...

6
Random Matrix Games in Wireless Networks Manzoor Ahmed Khan DAI-Labor, Technical University (TU) Berlin Germany E-mail:[email protected] Hamidou Tembine Ecole Superieure d’Electricite (SUPELEC) Gif-sur-Yvette, France E-mail: [email protected] Abstract—In this paper we propose an interactive decision- making framework for wireless networks where the outcome is influenced not only by the decisions of the users but also by a random variable. We examine specially the finite players case which we call random matrix games (RMGs). We present different approaches and solutions concepts in such games as well as distributed strategic learning in which each player adjusts her strategy in response to the recent information and signals. The applicability of the proposed framework is illustrated in user-centric network selection under measurement noise. I. I NTRODUCTION As the complexity of the existing networks is growing up and the environment cannot be assumed to be constant in many wireless networks, we need to study and explore the behavior and performance of such systems which involve not only the decision-making strategies and time dependencies but also the randomness of the state of the environment, the variability of the demands, the uncertainty of the system parameters, the random activity of the users, error and noise in the measurement etc. The activation of this potential necessitates the development of rigorous mathematical framework to allow a comparative analysis of various schemes under random variables. The randomness (channel, fading,shadowing, error, measurement noise, delayed measurement, etc) is one of the key parameters of performance in wireless networks. However, the classical game-theoretic approaches in wireless networks [2], [3], [4], [5] do not consider these issues. We consider random matrix games (RMGs) in which the realized payoffs are influenced not only by the actions of the users but also by a random variable. Given random payoff matrices, the questions arise as what is meant by playing the random matrix game (RMG) in an optimal way. Because now the actual payoff of the game depends not only on the action profile picked by the users but also on the sample point of realized state of the nature. Therefore the users cannot guar- antee themselves a certain payoff level or satisfaction level. The users are forced to gamble depending a certain random variable. The question of how one gambles in an optimal way needs to be defined. One of the approaches to this type of games is to replace the random matrices by their expectations leading to expected game and then solve deterministic matrix game. Such an approach has the advantage of being simple and provides many interesting results in terms of existence of solutions. However, the expectation approach may not always capture the behavior of the system in presence of risk-sensitive users. Second important approach is considering a satisfactory criterion consisting of the probability that the payoff is above a certain threshold. Third approach is to incorporate the variance or higher moments of the payoffs in the response of the users, a simple criterion could be to maximize the mean payoff and to minimize the variance of the random payoff simultaneously. The latter leads to a multiobjective criterion whose solutions are related to Pareto boundaries depending on the behavior of the other users. Random matrix game theory is not well-investigated in wireless networks. To any given game matrix, a variety of formulae and algorithms may be applied to yield limit cycles, equilibrium strategies and equilibrium payoffs, etc. But if the game matrix is random, these will be random variables. Their distribution is not easy to determine analytically for any fixed matrix size, still less their asymptotics as the matrix size tends to infinity as it is usually done in random matrix theory. The re- sults of random matrix theory does not apply directly to RMG. One of the main reasons is that RMGs are not only random matrices but also interactive and controlled random matrices (controlled by the interactive behavior of the users). RMGs are present in many networking and communication problems. RMG is a particularly useful tool for modeling and studying interactions between cognitive radios envisioned to operate in future communications systems. Such terminals will have the capability to adapt to the context and uncertain environment they operate in, through possibly power and rate control as well as channel selection. Users performance metrics such as successful reception conditions, throughput/connectivity are generally random quantities because they are influenced by random channel states and environment states. Most of the game-theoretic analysis in communication networks are done only in the context of static games (by fixing the parameters) which may not capture the stochastic nature of wireless com- munication systems. RMG turns out to be more appropriated tool for modelling interaction in communication networks. It allows to model interactive decision-making problems of the nodes and evaluate network performances under random channel states, fading, shadowing, measurement noise, back- ground noise, path loss etc, in situations where finite number of choices are available to the players (users, operators, mobile devices, base stations, protocol, etc.). In [6], we have studied games under uncertainty for MIMO systems which can be seen as RMG. There are many interesting applications of 2012 IEEE Global High Tech Congress on Electronics 978-1-4673-5085-3/12/$31.00 ©2012 IEEE 81

Upload: hamidou

Post on 09-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

Random Matrix Games in Wireless Networks

Manzoor Ahmed KhanDAI-Labor, Technical University (TU) Berlin

Germany

E-mail:[email protected]

Hamidou TembineEcole Superieure d’Electricite (SUPELEC)

Gif-sur-Yvette, France

E-mail: [email protected]

Abstract—In this paper we propose an interactive decision-making framework for wireless networks where the outcomeis influenced not only by the decisions of the users but alsoby a random variable. We examine specially the finite playerscase which we call random matrix games (RMGs). We presentdifferent approaches and solutions concepts in such games aswell as distributed strategic learning in which each player adjustsher strategy in response to the recent information and signals.The applicability of the proposed framework is illustrated inuser-centric network selection under measurement noise.

I. INTRODUCTION

As the complexity of the existing networks is growing up

and the environment cannot be assumed to be constant in many

wireless networks, we need to study and explore the behavior

and performance of such systems which involve not only the

decision-making strategies and time dependencies but also the

randomness of the state of the environment, the variability

of the demands, the uncertainty of the system parameters,

the random activity of the users, error and noise in the

measurement etc. The activation of this potential necessitates

the development of rigorous mathematical framework to allow

a comparative analysis of various schemes under random

variables. The randomness (channel, fading,shadowing, error,

measurement noise, delayed measurement, etc) is one of

the key parameters of performance in wireless networks.

However, the classical game-theoretic approaches in wireless

networks [2], [3], [4], [5] do not consider these issues. We

consider random matrix games (RMGs) in which the realized

payoffs are influenced not only by the actions of the users

but also by a random variable. Given random payoff matrices,

the questions arise as what is meant by playing the random

matrix game (RMG) in an optimal way. Because now the

actual payoff of the game depends not only on the action

profile picked by the users but also on the sample point of

realized state of the nature. Therefore the users cannot guar-

antee themselves a certain payoff level or satisfaction level.

The users are forced to gamble depending a certain random

variable. The question of how one gambles in an optimal way

needs to be defined. One of the approaches to this type of

games is to replace the random matrices by their expectations

leading to expected game and then solve deterministic matrix

game. Such an approach has the advantage of being simple

and provides many interesting results in terms of existence of

solutions. However, the expectation approach may not always

capture the behavior of the system in presence of risk-sensitive

users.

Second important approach is considering a satisfactory

criterion consisting of the probability that the payoff is above a

certain threshold. Third approach is to incorporate the variance

or higher moments of the payoffs in the response of the users,

a simple criterion could be to maximize the mean payoff and

to minimize the variance of the random payoff simultaneously.

The latter leads to a multiobjective criterion whose solutions

are related to Pareto boundaries depending on the behavior of

the other users.

Random matrix game theory is not well-investigated in

wireless networks. To any given game matrix, a variety of

formulae and algorithms may be applied to yield limit cycles,

equilibrium strategies and equilibrium payoffs, etc. But if the

game matrix is random, these will be random variables. Their

distribution is not easy to determine analytically for any fixed

matrix size, still less their asymptotics as the matrix size tends

to infinity as it is usually done in random matrix theory. The re-

sults of random matrix theory does not apply directly to RMG.

One of the main reasons is that RMGs are not only random

matrices but also interactive and controlled random matrices

(controlled by the interactive behavior of the users). RMGs

are present in many networking and communication problems.

RMG is a particularly useful tool for modeling and studying

interactions between cognitive radios envisioned to operate in

future communications systems. Such terminals will have the

capability to adapt to the context and uncertain environment

they operate in, through possibly power and rate control as

well as channel selection. Users performance metrics such

as successful reception conditions, throughput/connectivity are

generally random quantities because they are influenced by

random channel states and environment states. Most of the

game-theoretic analysis in communication networks are done

only in the context of static games (by fixing the parameters)

which may not capture the stochastic nature of wireless com-

munication systems. RMG turns out to be more appropriated

tool for modelling interaction in communication networks.

It allows to model interactive decision-making problems of

the nodes and evaluate network performances under random

channel states, fading, shadowing, measurement noise, back-

ground noise, path loss etc, in situations where finite number

of choices are available to the players (users, operators, mobile

devices, base stations, protocol, etc.). In [6], we have studied

games under uncertainty for MIMO systems which can be

seen as RMG. There are many interesting applications of

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 81

Page 2: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

zero-sum random matrix games in wireless networks. The

randomness allows to study the robustness of the network with

the respect to malicious attack, failure, noisy measurements

and environment perturbation.

Our contribution can be summarized as follows. We intro-

duce random matrix games (zero-sum and nonzero-sum) and

present its relevance in various applications. We distinguish

two approaches: expectation (risk-neutral) and risk-sensitive

approach to RMGs. Formally define the three solution con-

cepts: state-robust equilibrium, state-independent equilibrium

and state-dependent equilibrium. Existence and non-existence

of these solution concepts are discussed. The framework is

extended to dynamic scenario. We propose distributed strategic

learning based imitative combined learning and evolutionary

game dynamics. The learning scheme is applied to resource

selection problem with uncertainties. Interestingly, we observe

that the proposed scheme approaches global optima of the

network-selection scenario even under measurement noise.

The paper is structured as follows. We first present zero-sum

random matrix games. Then, we focus on nonzero-sum RMG.

Dynamic RMGs and random poly-matrix games are discussed.

We conclude the paper with numerical investigations. We

summarize some of the notations in Table I.

TABLE ISUMMARY OF NOTATIONS

Symbol Meaning

W ⊆ Rk state space

N set of potential playersr, r′ action of the row playerc, c′ action of the column playerAj set of actions of player j, |Aj | ≥ 2.sj ∈ Aj a generic element of Aj

Xj := Δ(Aj) set of probability distributions over Aj

aj,t ∈ Aj action of the player j at time txj,t ∈ Xj strategy of the player j at tMj(w) random matrix of player j(λj,t, νj,t) learning rates of player j at t

m̂j,t ∈ R|Aj | estimated payoff vector of player j at t

II. ZERO-SUM RANDOM MATRIX GAMES

We consider the most basic class of games. There are

two users (players), the row player and the column player.

A game between them is determined by an l1 × l2 matrix

M = (mij)i,j of real numbers. The row player has l1 pure

strategies, corresponding to the l1 rows of M. The column

player has l2 pure strategies, corresponding to the l2 columns

of M. If the row player plays her i−th strategy and the column

player plays his j−th strategy then the column player pays the

row player the real number mij . The player are allowed to

randomize their actions i.e. they can choose mixed strategies

as well. This means the row player’s move is to choose a row

vector1 x1 = (x11, . . . , x1l1) with the x1k nonnegative and

summing to one. The column player’s move is to choose a

column vector x2 = (x21, . . . , x2l2) with the x2k′ likewise

1it should be understood as an already transposed vector

nonnegative and summing to one. The players make their

moves independently and the game is concluded by the column

player paying the expected number x1Mx2 to the row player.

von Neumann minmax solution concept: We now focus on

solution concepts of such games. A triple (x1, x2, v) consisting

of a row probability l1−vector x1, a column probability

l2−vector x2, and a number v is called a solution of the

zero-sum2 if it satisfies the following conditions:

∀y1, y1Mx2 ≤ x1Mx2 = v ≤ x1My2, ∀y2 ∈ X2, (1)

where Xj = {xj | xj(k) ≥ 0,∑lj

k=1 xj(k) = 1}. In that

case the pair (x1, x2) is called saddle point. Thus (x1, x2, v)is a classical solution (in the sense of von Neumann) and

(x1, x2) is a saddle point if and only if unilateral deviation

from (x1, x2) by either player will never result in an improved

outcome for that player. It is known from von Neumann

minmax theorem that every zero-sum game of this type with

matrix M ∈ Rl1×l2 has at least one saddle point. Moreover

for all these saddle points (if many), the number v is the same,

this number being called the value of the game. In addition the

saddle points are interchangeable i.e., if (x1, x2) and (y1, y2)are two saddle points then (x1, y2) and (y1, x2) are also saddle

points.

We now introduce randomness to the coefficients of the

matrix M. To consider random games, we fix an entry measure

for each entry of the matrix mij(w) ∈ R, where the state wis driven by a probability measure ν on a finite dimensional

real space W equipped with the canonical σ−algebra. The

collection

({1, 2},W, ν, {1, . . . , l1}, {1, . . . , l2},M(w),−M(w))

is a zero-sum random matrix game (RMG) where w ∼ ν.The triplet (x1(w), x2(w), v(w))w∈W is a state-dependent

saddle point of RMG if for each w ∈ W , ∀y1(w),∀y2(w), y1(w)Mx2(w) ≤ x1(w)M(w)x2(w) = v(w) ≤x′1(w)M(w)y2(w),

III. NONZERO SUM RANDOM MATRIX GAMES

We consider a two-player non-zero sum game. As above

there is a row player and a column player. A game be-

tween them is determined by a random state variable w and

two matrices of size l1 × l2, M1(w) = (m1,ij(w)) and

M2(w) = (m2,ij(w)) of real numbers. The row player has

l1 pure strategies, corresponding to the l1 rows of M1(w).The column player has l2 pure strategies, corresponding to

the l2 columns of M2(w). If the row player plays her i-thstrategy and the column player plays her j-th strategy then the

row player receives the real number m1,ij(w) and the column

player receives the real number m2,ij(w).Next we define basic solution concepts for RMGs.

Definition 1 (State-robust equilibrium). A strategy profile(x1, x2) is a state-robust equilibrium if it is an equilibriumfor any state w.

2the results extend to two-player constant-sum games

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 82

Page 3: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

This type of equilibrium strategies are state-independent

and distribution independent. However, this solution concept

requires a strong condition as stated by the following Lemma.

Lemma 1. A state-robust equilibrium may not exists in RMG.

To prove this Lemma, consider two games G1 and G2 such

that in the first game, the first action is a dominant action

and in the second game the second action is a dominant

action. Now consider a random variable w which takes value

in {O1, O2}. Suppose that the support of w is {O1, O2}. For

w = Oi the corresponding game is Gi. It is clear that the

resulting RMG has no state-robust pure equilibrium.

Definition 2 (State-dependent equilibrium). A state-dependentequilibrium (x1(w), x2(w))w is a collection of equilibriumprofile per state.

Lemma 2. There exists at least one state-dependent (mixed)equilibrium of the expected RMG.

The proof of this Lemma is obtained by concatenation of

the equilibrium of each realized game which has at least one

equilibrium in mixed strategies (Nash theorem).

Definition 3 (State-independent equilibrium). A strategy pro-file (x1, x2) is state-independent equilibrium if x1 and x2 areindependent of the realized value of the state and they forman equilibrium of the expected game.

Lemma 3. There exists at least one state-independent (mixed)equilibrium in the expected RMG.

To prove this Lemma, we apply Nash theorem to the

expected game which is a bi-matrix game.

Example 1 (Ergodic Rate and Channel Uncertainty). Con-sider two mobile stations (MSs) and two small base stations(sBSs). Each mobile station can transmit to one of the basestations. If mobile station MS1 chooses a different basestation than mobile station MS2 then, mobile station MS1

gets the payoff log2(1 + p1|h1|2

N0

), and mobile station 2 gets

log2

(1 + p2|h2|2

N0

)where p1 and p2 are (strictly) positive

transmit powers, h1, h2 are channel states (random), and N0

is a background noise parameter. If both MSs transmit at thesame base station, there is an interference; mobile stationMS1 gets log2

(1 + p1|h1|2

N0+p2|h2|2), and mobile station MS2

gets log2

(1 + p2|h2|2

N0+p1|h1|2). The following table summarizes

the different configurations. Mobile station MS1 chooses arow (row player) and MS2 chooses a column (column player).The first component of the payoff is for MS1 and the secondcomponent is for MS2.

We distinguish four configurations: (i) (h1, h2) �=(0, 0), hj �= 0 (ii) (0, h2), h2 �= 0, (iii) (h1, 0), h1 �= 0,(iv) (0, 0).

Next we give a detailed analysis of the game in eachconfiguration in {1, 2, 3, 4}.

• Configuration one: In the first category, all the param-eters are strictly positive. This is an anticoordination

game. The game has two pure equilibria which consistsof (sBS1, sBS2) or (sBS2, sBS1). There is also onefully mixed equilibrium.

• Configuration two: The game within this category has itsstate in the form of (0, h2), h2 �= 0. These games havea continuum of equilibria. Any strategy of user 1 is anequilibrium strategy.

• Configuration three: The game with category three hasits state in the form (h1, 0), h1 �= 0. These games havecontinuum of equilibria.

• Configuration four: Any strategy profile is an equilibriumand a global optimum in this game.

Now what about the expected game?It has been shown in [8] that, in the expected game of this

RMG example, a large class of distributed strategic learningalgorithms converges to one of the global optima, which is aninteresting property. Moreover the estimated payoffs convergeto global optimum payoffs.

IV. DYNAMIC RANDOM MATRIX GAMES

A. Evolutionary Random Matrix GamesWe consider evolutionary games described by many local

pairwise interactions. In each pairwise interaction, players areinvolved in a RMG. The resulting evolutionary game dynamicsare stochastic dynamics. Using stochastic approximations, theasymptotic pseudotrajectories of these dynamics can be reliedwith ordinary differential equations with the expected payoffs.An example is the replicator dynamics given by

d

dtx1,t(r) = x1,t(r)

⎡⎣(EM1(w)x2,t)r −

r′x1,t(r

′)(EM1(w)x2,t)r′

⎤⎦ (2)

d

dtx2,t(c) = x2,t(c)

⎡⎣(Ex

′1,tM2(w))c −

c′x2,t(c

′)(Ex1,tM2(w))

c′

⎤⎦ (3)

The set of stationary points of this dynamics contains the set

of state-independent Nash equilibria. It contains also the set

of pure global optima of the game.

B. Generic dynamic RMG without state information

We now focus on a long-run RMG without private infor-

mation on the realized state (on both side).

The dynamic game is described as follows.

• Initialization: At time t = 0, Nature generates a state w0

from some distribution (unknown to the players). Each

player j chooses an action aj,0. The action profile a0and the realized state w0 determined a realized payoff

vector (rj,0)j . Each player j observes a noisy value of

her realized payoff rj,0.• At time t, the private history of player j is the collection

of measurements and chosen actions by herself up to t−1.a new state is generated. Each player j chooses an action

her history and past observations (aj,t′ , rj,t′)t′≤t−1. Each

player measures a noisy value of her payoff rj,t.• The game moves to t+ 1.

Each player aims to maximize her average payoff.

A strategy of a player j in the dynamic game is a collection

of mapping. A pure strategy σj,t of player j at time t is a map

of the private history set Hj,t = (Aj × R)t−1

to the action

space Aj . For hj,t ∈ Hj,t, σj,t(hj,t) ∈ Aj . We denote by Σj

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 83

Page 4: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

the set of pure strategies of producer j. The average payoff

at time T is Fj,T (σ) =1

T+1

∑Tt=0 rj,t. The dynamic game is

then represented by Gp,T := (W, ν, ,N , (Σj , Fj,T )j∈N ). We

seek for equilibrium of the inferior limit of Gp,T .

C. Learning in Random Matrix GamesLet x1,t+1(r) be the probability of the row player to choose

the row r at time iteration t+1 and x2,t+1(c) be the probability

of the column player to choose the column c at time iteration

t + 1. The parameter λj,t is a positive real number and

represents the learning rate for strategy-dynamics of player

j. μj,t is a positive real number which represents the learning

rate for payoff-dynamics. A convenient class of learning in

RMG is the class of combined fully distributed payoff and

strategy learning (CODIPAS).In the CODIPAS scheme each player tries to learn her

payoff functions as well as the associated optimal strategies.The CODIPAS scheme is well-adapted to RMGs for multi-

ple reasons:

• CODIPAS is designed for random environment.

• It is only based numerical measurements and include

noisy observation, measurement error and outdated mea-

surements.

• The player does not need to know the others actions or

others payoffs.

• In contrast to the classical schemes, there is a no complex

optimization problem to solve from one iteration to

another.

• CODIPAS is flexible enough to incorporate asynchronous

and random updates

The key idea can be summarized in two lines:

(i) The behaviors influence the outcomes.

(ii) The consequences influence the behaviors of the players.

Due to the randomness, each player estimates her payoff. m̂j,t

is the estimated payoff vector of player j at time t. Based on

estimations, player j constructs a strategy for the next iteration.Below we provide one example of CODIPAS that is well-

adapted to RMGs as we will see the numerical investigation

in Section VI.⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Initializationm̂1,0 = (m̂1,0(1), . . . , m̂1,0(l1))x1,0 = (x1,0(1), . . . , x1,0(l1))m̂2,0 = (m̂2,0(1), . . . , m̂2,0(l2))x2,0 = (x2,0(1), . . . , x2,0(l2))

Define the sequences up to T : λj,t, μj,t

For t ∈ {1, 2, . . . , T}Learning pattern of the row playerx1,t+1(r) =

x1,t(r)(1+λ1,t)m̂1,t(r)

∑r′ x1,t(r′)(1+λ1,t)

m̂1,t(r′)

m̂1,t+1(r) = m̂1,t(r) + μ1,t1l{a1,t=r}(m1,t − m̂1,t(r))

Learning pattern of the column playerx2,t+1(c) =

x2,t(c)(1+λ2,t)m̂2,t(c)

∑c′ x2,t(c′)(1+λ2,t)

m̂2,t(c′)

m̂2,t+1(c) = m̂2,t(c) + μ2,t1l{a2,t=c}(m2,t − m̂2,t(c))Interpretation of the above learning scheme: The payoff-

learning given by m̂t estimates the expected payoffs for each

of the actions. When T is sufficiently large and the actions

sufficiency explored, each user will be able to learn the

expected payoff via these estimations. However a user does

not need to wait till the end of the exploration phase. Each

user adapts its strategy simultaneously with a recent experience

mj,t. Then, the strategy-learning xt is used to learn the

optimal strategies. The strategies are imitative in the sense the

next probabilities are proportional to the previous ones. The

probability of strategies that result in higher average payoff

will be increased and the other probabilities will decrease by

normalization.

Theorem 1. For small learning rate, the strategy-learning ofthe algorithm approximates the replicator equation (2).

Proof: The proof of the statement is simple. We first

observe that the limiting of limλ−→0+(1+λ)r̂−1

λ = r̂. This

allows to compute the drift which is the expected changes in

one-time slot: D1,t(r) =1

λ1tE (x1,t+1(r)− x1,t(r) |xt) . It is

not difficult to see that the asymptotic of D1,t(r) when λ1t

vanishes is given by the righthand side of the replicator equa-

tion (2). Using classical stochastic approximation technique,

the announced statement follows.

Theorem 2 (Asymptotic pseudotrajectory- constant learning

rates). The asymptotic pseudotrajectory for similar rates i.e.λi,t

μj,t−→ kij > 0 of the imitative CODIPAS is given by

⎧⎪⎪⎪⎨⎪⎪⎪⎩

ddtx1,t(r) = k11x1,t(r) [m̂1,t(r)−

∑r′ x1,t(r

′)m̂1,t(r′)]

ddtx2,t(c) = k21x2,t(c) [m̂2,t(c)−

∑c′ x2,t(c

′)m̂2,t(c′)]

ddtm̂1,t(r) = k12x1,t(r)

[(EM1(w)x2,t)r − m̂1,t(r)

]ddtm̂2,t(c) = k22x2,t(c)

[(Ex′

1,tM2(w))c− m̂2,t(c)

]

The proof of this Theorem is obtained by rewriting the

scheme in the form of Robbins-Monro and combining the

result of Theorem 1. Therefore we omit the details.

Remark 1. The scheme proposed here is different than thestochastic fictitious play, logit and Boltzmann-Gibbs learn-ing [8]. The main difference is that The set of stationarypoints of the limiting imitative CODIPAS contains the set ofequilibria and the set of global optima which is not the casefor the stochastic fictitious play, logit/Glauber dynamics andBoltzmann-Gibbs learning.

D. Mean-Variance Response

Consider a RMG with 2 players. The row player has a state-

dependent payoff function given by x′1M(w)x2. We introduce

two key performance metrics: the mean of the payoff and the

variance of the payoff. The objective here is not to consider

the expected payoff as performance but to include also the

variance which captures the notion of risk.

We define the mean-variance response (MVR) of the player

1 to a strategy x2 as follows: x1 ∈ MVR1(x2) if there

is no strategy y1 for which the following inequalities are

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 84

Page 5: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

simultaneously true:

Ew∼νy′1M(w)x2 ≥ Ew∼νx

′1M(w)x2, and (4)

Ew [y′1M(w)x2 − y′1Ew∼νM(w)x2)]2

≤ Ew [x′1M(w)x2 − x′

1Ew∼νM(w)x2]2

(5)

where at least one inequality is strict.

Similarly, we define the MVR of the column player as x2 ∈MVR2(x1) if there is no strategy y2 for which the following

inequalities are simultaneously true:

Ew∼νx′1M(w)y2 ≤ Ew∼νx

′1M(w)x2, and (6)

Ew [x′1M(w)y2 − x′

1Ew∼νM(w)y2)]2

≥ Ew [x′1M(w)x2 − x′

1Ew∼νM(w)x2]2

(7)

where at least one inequality is strict.

This formulation can be seen as a Pareto optimality response

to x2 in the sense that one cannot improve one of the payoff

component without degrading the other component. This is

a vector optimization criterion where the first objective is to

be maximized and the second objective (variance) is to be

minimized. In the context of games, one has a fixed point of

MVR. The mean variance solution is a saddle point of the

MVR correspondence.

We now formulate the nonzero-sum case. Mean-variancesolution: We define a mean-variance solution as a fixed point

of the MVR correspondence i.e., a strategy profile x such that

x1 ∈ MVR1(x2) and x2 ∈ MVR1(x1).

E. Satisfactory solution

Consider the two-player zero-sum case. In the satisfactory

criterion of optimality, the row player maximizes her proba-

bility of winning a certain amount no matter what strategy the

other player used. Formally, it is captured by the following

minmax problem

supx1

infx2

P (x′1M(w)x2 ≥ β) , β ∈ R.

V. RANDOM POLY-MATRIX GAMES

We extend the above framework to finitely many interacting

players and random payoffs. The model is similar to what is

called finite robust games. Let X =∏

j∈N Xj , with Xj be a

(lj − 1)-dimensional simplex of Rlj . Denote d :=

∑j∈N lj .

Let Rj(w, x) be the payoff function of player j. The expected

payoff is rj(x) = EwRj(w, x).We say that x is a state-independent equilibrium of the

expected RMG if for all player j ∈ N , EwRj(w, xj , x−j) ≥EwRj(w, yj , x−j), ∀yj ∈ Xj .

A. Variational inequality

The existence of state-independent equilibria is equivalent to

the existence of solution of the following variational inequalityproblem: find x such that

〈x− y, V (x)〉 ≥ 0, ∀y ∈∏j

Xj ,

where 〈., .〉 is the inner product, V (x) = [V1(x), . . . , Vn(x)],

Vj(x) = [EwRj(w, esj ,x−j)]sj∈Aj,

esj is the unit vector with 1 at the position of sj and 0otherwise, sj ∈ Aj .

Note that an equilibrium of the expected RMG may not be

an equilibrium at each time slot. This is because x being an

equilibrium for expected RMG does not imply that x is an

equilibrium of the game G(w) for some state w ∈ W.We aim to approximate a solution to the variational inequal-

ity. Let εRNE(x) =∑

j∈N[maxyj∈Xj

rj(yj , x−j)− rj(x)],

be the sum of improvements that can be obtained by unilateral

deviation.

Lemma 4. (i) εRNE(x) ≥ 0 for any x ∈ X . (ii) If εRNE(x) =0 for a certain x ∈ X , then one gets an equilibrium.

Proof: Suppose xj ∈ Xj . It is clear that[maxyj∈Xj

rj(yj , x−j)− rj(x)] ≥ 0 hence εRNE(x) ≥ 0

for any x ∈ X . It is not difficult to see from the

definition that if εRNE(x) = 0 for a certain x ∈ X ,then ∀j, [

maxyj∈Xjrj(yj , x−j)− rj(x)

]= 0. This means

that x is an equilibrium. Hence a minimizer of εRNE(x) is a

solution to VI.

B. Expected robust potential in RMG

We say that a RMG has an expected robust potential if

the game with expected payoff (expectation with the respect

to the random variable w) is a potential game in the sense

of Monderer and Shapley (1996,[7]). Note that the state-

dependent games need not be potential game. The main feature

of expected robust potential is that pure equilibria seeking

and mixed equilibria seeking are well-suited using imitative

CODIPAS.

VI. NUMERICAL INVESTIGATION

Consider a class of network selection under uncertainty

and investigate the behavior of the players using distributed

strategic learning. The basic setting is a Three-by-Two RMG:

three players and two actions per player in each state. The

main difference with the classical approaches in the litera-

ture is the uncertainty of environment. Here the players are

interacting for resource access in random environment. The

action set of the row player is {r1, r2} and action set of

the column player is {c1, c2}. The third player chooses one

of the two tables (array) {a1, a2}. The payoff entry has two

parts: a deterministic part and the noisy parts given by real-

valued zero-mean random variables njabc. In all the numerical

examples below, we consider these random variables to be

uniformly distributed in [−1/8, 1/8]. We fix the parameter

α = 0.995 The initial configurations of strategies are x1,0 =(0.52, 0.48), x2,0 = (0.5, 0.5), x3,0 = (0.49, 0.51) and the

estimated payoffs are initialized at m̂1,0 = (0.1, 0.1), m̂2,0 =(0.1, 0.1), m̂2,0 = (0.1, 0.1)

In Figure 1 we plot the evolution of probability of choosing

the first action by the users. We observe that the CODIPAS

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 85

Page 6: [IEEE 2012 IEEE Global High Tech Congress on Electronics (GHTCE) - Shenzhen, China (2012.11.18-2012.11.20)] 2012 IEEE Global High Tech Congress on Electronics - Random matrix games

1\ 2 c1 c2r1 (0, 0, 0) + (n1

111, n2111, n

3111) (0, α, 0) + (n1

111, n2121, n

3121)

r2 (α, 0, 0) + (n1211, n

2211, n

3211) (0, 0, α) + (n1

221, n2221, n

3221)

1\ 2 c1 c2r1 (0, 0, α) + (n1

112, n2112, n

3112) (α, 0, 0) + (n1

122, n2122, n

3122)

r2 (0, α, 0) + (n1212, n

2212, n

3212) (0, 0, 0) + (n1

22, n2222, n

3222)

TABLE IISTRATEGIC FORM REPRESENTATION FOR 3 PLAYERS - 2 ACTIONS

0 1000 2000 3000 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Prob

abil

ity

of p

layi

ng t

he f

irst

act

ion

Number of iterations

x1,t(r1)x2,t(c1)x3,t(a1)

Fig. 1. Convergence of the strategies to a global optimum of RMG

converges to a pure global optimum (0, 1, 1) even under mea-

surement noise, which is an interesting robustness property.

We plot the lines 0.05 and 0.995 in order to check the

convergence time within a range of error 5%. As we can

see the strategies of all three users are converged within the

window of length 500. In Figure 2 we plot the evolution of

payoff when choosing the first action by the users. We observe

that user 1′s payoff estimations converges to 1 and the two

others estimation goes to zero for the first action. In Figure

3 we plot in three dimension the evolution of the probability

of choosing the first action by the users in function of the

others. The CODIPAS converges to a corner (0, 1, 1) of the

cube which is pure global optimum of the expected RMG.

Other variations: We have simulated other configurations

and observed that the others global optima can also be obtained

by changing the initial condition. The measurement noise can

have positive effect sometimes in the sense that it convergence

0 1000 2000 3000 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Aver

age

of p

erce

ived

pay

offs

Number of iterations

avg.m1,tavg.m2,tavg.m3,t

Fig. 2. Evolution of the average payoff estimations of RMG

00.2

0.40.6

0.2

0.4

0.6

0.8

10.2

0.4

0.6

0.8

1

x1,t

(r1)

Convergence to a pure global optimum by imitative CODIPAS

x2,t

(c1)

x3,t(a1)

Fig. 3. 3D plot: Convergence of the strategies to a global optimum of RMG

time can be faster. We can accelerate the convergence time by

simply adding the scaling growth parameter into the learning

scheme. Heterogeneous scaling helps also to accelerate the

convergence time of both learning schemes (payoff dynamics

and strategy dynamics).

VII. CONCLUDING REMARKS

In this paper we have presented preliminary results of the

emerging area of random matrix games in wireless networks.

Both zero-sum and nonzero-sum games are examined. The

framework is flexible enough to extend to evolutionary RMGs

and distributed strategic learning which allows to learn ex-

pected payoffs, variances and optimal strategies in wide range

of random bi-matrix games.

Acknowledgement: This work was supported in part by theCoDECoM Project, through HIRP- YJCB2010003RE.

REFERENCES

[1] M. Khan, H. Tembine, T. Vasilakos, Evolutionary Coalitional Games:Design and Challenges in Wireless Networks, IEEE Wireless Commu-nications Magazine, Special issue User Cooperation, 19 , 2 pp. 50-56,2012.

[2] V. Srivastava, J. Neel, A. B. MacKenzie, R. Menon, L. A. DaSilva, J.E. Hicks, J. H. Reed, and R. P. Gilles. Using game theory to analyzewireless ad hoc networks. IEEE Communications Surveys and Tutorials,7(4):46-56, 2005.

[3] E. Altman, T. Boulogne, R. El-Azouzi, T. Jimenez, and L. Wynter. Asurvey on networking games in telecommunications. Computers andOperations Research, 2006.

[4] S. H. Low, L. Chen, T. Cui and J. C. Doyle. A game-theoreticmodel for medium access control. IEEE Journal on Selected Areas inCommunications, 26(7), 2008.

[5] G. Kasbekar and A. Proutiere. Opportunistic medium access in mul-tichannel wireless systems: A learning approach. in Proc. of AllertonConference on Communications, Control, and Computing, 2010.

[6] H. Tembine, Dynamic Robust Games in MIMO systems, IEEE Trans-actions on Systems, Man, Cybernetics, Part B, 99, Volume: 41, Issue:4, pp. 990 - 1002, Aug. 2011.

[7] Monderer, D., and Shapley, L.S. 1996, Potential Games, Games andEconomic Behavior, 14, 124-143.

[8] H. Tembine, Distributed strategic learning for wireless engineers, CRCPress, Taylor & Francis, 2012.

2012 IEEE Global High Tech Congress on Electronics

978-1-4673-5085-3/12/$31.00 ©2012 IEEE 86