[ieee 2012 ieee global high tech congress on electronics (ghtce) - shenzhen, china...
TRANSCRIPT
Random Matrix Games in Wireless Networks
Manzoor Ahmed KhanDAI-Labor, Technical University (TU) Berlin
Germany
E-mail:[email protected]
Hamidou TembineEcole Superieure d’Electricite (SUPELEC)
Gif-sur-Yvette, France
E-mail: [email protected]
Abstract—In this paper we propose an interactive decision-making framework for wireless networks where the outcomeis influenced not only by the decisions of the users but alsoby a random variable. We examine specially the finite playerscase which we call random matrix games (RMGs). We presentdifferent approaches and solutions concepts in such games aswell as distributed strategic learning in which each player adjustsher strategy in response to the recent information and signals.The applicability of the proposed framework is illustrated inuser-centric network selection under measurement noise.
I. INTRODUCTION
As the complexity of the existing networks is growing up
and the environment cannot be assumed to be constant in many
wireless networks, we need to study and explore the behavior
and performance of such systems which involve not only the
decision-making strategies and time dependencies but also the
randomness of the state of the environment, the variability
of the demands, the uncertainty of the system parameters,
the random activity of the users, error and noise in the
measurement etc. The activation of this potential necessitates
the development of rigorous mathematical framework to allow
a comparative analysis of various schemes under random
variables. The randomness (channel, fading,shadowing, error,
measurement noise, delayed measurement, etc) is one of
the key parameters of performance in wireless networks.
However, the classical game-theoretic approaches in wireless
networks [2], [3], [4], [5] do not consider these issues. We
consider random matrix games (RMGs) in which the realized
payoffs are influenced not only by the actions of the users
but also by a random variable. Given random payoff matrices,
the questions arise as what is meant by playing the random
matrix game (RMG) in an optimal way. Because now the
actual payoff of the game depends not only on the action
profile picked by the users but also on the sample point of
realized state of the nature. Therefore the users cannot guar-
antee themselves a certain payoff level or satisfaction level.
The users are forced to gamble depending a certain random
variable. The question of how one gambles in an optimal way
needs to be defined. One of the approaches to this type of
games is to replace the random matrices by their expectations
leading to expected game and then solve deterministic matrix
game. Such an approach has the advantage of being simple
and provides many interesting results in terms of existence of
solutions. However, the expectation approach may not always
capture the behavior of the system in presence of risk-sensitive
users.
Second important approach is considering a satisfactory
criterion consisting of the probability that the payoff is above a
certain threshold. Third approach is to incorporate the variance
or higher moments of the payoffs in the response of the users,
a simple criterion could be to maximize the mean payoff and
to minimize the variance of the random payoff simultaneously.
The latter leads to a multiobjective criterion whose solutions
are related to Pareto boundaries depending on the behavior of
the other users.
Random matrix game theory is not well-investigated in
wireless networks. To any given game matrix, a variety of
formulae and algorithms may be applied to yield limit cycles,
equilibrium strategies and equilibrium payoffs, etc. But if the
game matrix is random, these will be random variables. Their
distribution is not easy to determine analytically for any fixed
matrix size, still less their asymptotics as the matrix size tends
to infinity as it is usually done in random matrix theory. The re-
sults of random matrix theory does not apply directly to RMG.
One of the main reasons is that RMGs are not only random
matrices but also interactive and controlled random matrices
(controlled by the interactive behavior of the users). RMGs
are present in many networking and communication problems.
RMG is a particularly useful tool for modeling and studying
interactions between cognitive radios envisioned to operate in
future communications systems. Such terminals will have the
capability to adapt to the context and uncertain environment
they operate in, through possibly power and rate control as
well as channel selection. Users performance metrics such
as successful reception conditions, throughput/connectivity are
generally random quantities because they are influenced by
random channel states and environment states. Most of the
game-theoretic analysis in communication networks are done
only in the context of static games (by fixing the parameters)
which may not capture the stochastic nature of wireless com-
munication systems. RMG turns out to be more appropriated
tool for modelling interaction in communication networks.
It allows to model interactive decision-making problems of
the nodes and evaluate network performances under random
channel states, fading, shadowing, measurement noise, back-
ground noise, path loss etc, in situations where finite number
of choices are available to the players (users, operators, mobile
devices, base stations, protocol, etc.). In [6], we have studied
games under uncertainty for MIMO systems which can be
seen as RMG. There are many interesting applications of
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 81
zero-sum random matrix games in wireless networks. The
randomness allows to study the robustness of the network with
the respect to malicious attack, failure, noisy measurements
and environment perturbation.
Our contribution can be summarized as follows. We intro-
duce random matrix games (zero-sum and nonzero-sum) and
present its relevance in various applications. We distinguish
two approaches: expectation (risk-neutral) and risk-sensitive
approach to RMGs. Formally define the three solution con-
cepts: state-robust equilibrium, state-independent equilibrium
and state-dependent equilibrium. Existence and non-existence
of these solution concepts are discussed. The framework is
extended to dynamic scenario. We propose distributed strategic
learning based imitative combined learning and evolutionary
game dynamics. The learning scheme is applied to resource
selection problem with uncertainties. Interestingly, we observe
that the proposed scheme approaches global optima of the
network-selection scenario even under measurement noise.
The paper is structured as follows. We first present zero-sum
random matrix games. Then, we focus on nonzero-sum RMG.
Dynamic RMGs and random poly-matrix games are discussed.
We conclude the paper with numerical investigations. We
summarize some of the notations in Table I.
TABLE ISUMMARY OF NOTATIONS
Symbol Meaning
W ⊆ Rk state space
N set of potential playersr, r′ action of the row playerc, c′ action of the column playerAj set of actions of player j, |Aj | ≥ 2.sj ∈ Aj a generic element of Aj
Xj := Δ(Aj) set of probability distributions over Aj
aj,t ∈ Aj action of the player j at time txj,t ∈ Xj strategy of the player j at tMj(w) random matrix of player j(λj,t, νj,t) learning rates of player j at t
m̂j,t ∈ R|Aj | estimated payoff vector of player j at t
II. ZERO-SUM RANDOM MATRIX GAMES
We consider the most basic class of games. There are
two users (players), the row player and the column player.
A game between them is determined by an l1 × l2 matrix
M = (mij)i,j of real numbers. The row player has l1 pure
strategies, corresponding to the l1 rows of M. The column
player has l2 pure strategies, corresponding to the l2 columns
of M. If the row player plays her i−th strategy and the column
player plays his j−th strategy then the column player pays the
row player the real number mij . The player are allowed to
randomize their actions i.e. they can choose mixed strategies
as well. This means the row player’s move is to choose a row
vector1 x1 = (x11, . . . , x1l1) with the x1k nonnegative and
summing to one. The column player’s move is to choose a
column vector x2 = (x21, . . . , x2l2) with the x2k′ likewise
1it should be understood as an already transposed vector
nonnegative and summing to one. The players make their
moves independently and the game is concluded by the column
player paying the expected number x1Mx2 to the row player.
von Neumann minmax solution concept: We now focus on
solution concepts of such games. A triple (x1, x2, v) consisting
of a row probability l1−vector x1, a column probability
l2−vector x2, and a number v is called a solution of the
zero-sum2 if it satisfies the following conditions:
∀y1, y1Mx2 ≤ x1Mx2 = v ≤ x1My2, ∀y2 ∈ X2, (1)
where Xj = {xj | xj(k) ≥ 0,∑lj
k=1 xj(k) = 1}. In that
case the pair (x1, x2) is called saddle point. Thus (x1, x2, v)is a classical solution (in the sense of von Neumann) and
(x1, x2) is a saddle point if and only if unilateral deviation
from (x1, x2) by either player will never result in an improved
outcome for that player. It is known from von Neumann
minmax theorem that every zero-sum game of this type with
matrix M ∈ Rl1×l2 has at least one saddle point. Moreover
for all these saddle points (if many), the number v is the same,
this number being called the value of the game. In addition the
saddle points are interchangeable i.e., if (x1, x2) and (y1, y2)are two saddle points then (x1, y2) and (y1, x2) are also saddle
points.
We now introduce randomness to the coefficients of the
matrix M. To consider random games, we fix an entry measure
for each entry of the matrix mij(w) ∈ R, where the state wis driven by a probability measure ν on a finite dimensional
real space W equipped with the canonical σ−algebra. The
collection
({1, 2},W, ν, {1, . . . , l1}, {1, . . . , l2},M(w),−M(w))
is a zero-sum random matrix game (RMG) where w ∼ ν.The triplet (x1(w), x2(w), v(w))w∈W is a state-dependent
saddle point of RMG if for each w ∈ W , ∀y1(w),∀y2(w), y1(w)Mx2(w) ≤ x1(w)M(w)x2(w) = v(w) ≤x′1(w)M(w)y2(w),
III. NONZERO SUM RANDOM MATRIX GAMES
We consider a two-player non-zero sum game. As above
there is a row player and a column player. A game be-
tween them is determined by a random state variable w and
two matrices of size l1 × l2, M1(w) = (m1,ij(w)) and
M2(w) = (m2,ij(w)) of real numbers. The row player has
l1 pure strategies, corresponding to the l1 rows of M1(w).The column player has l2 pure strategies, corresponding to
the l2 columns of M2(w). If the row player plays her i-thstrategy and the column player plays her j-th strategy then the
row player receives the real number m1,ij(w) and the column
player receives the real number m2,ij(w).Next we define basic solution concepts for RMGs.
Definition 1 (State-robust equilibrium). A strategy profile(x1, x2) is a state-robust equilibrium if it is an equilibriumfor any state w.
2the results extend to two-player constant-sum games
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 82
This type of equilibrium strategies are state-independent
and distribution independent. However, this solution concept
requires a strong condition as stated by the following Lemma.
Lemma 1. A state-robust equilibrium may not exists in RMG.
To prove this Lemma, consider two games G1 and G2 such
that in the first game, the first action is a dominant action
and in the second game the second action is a dominant
action. Now consider a random variable w which takes value
in {O1, O2}. Suppose that the support of w is {O1, O2}. For
w = Oi the corresponding game is Gi. It is clear that the
resulting RMG has no state-robust pure equilibrium.
Definition 2 (State-dependent equilibrium). A state-dependentequilibrium (x1(w), x2(w))w is a collection of equilibriumprofile per state.
Lemma 2. There exists at least one state-dependent (mixed)equilibrium of the expected RMG.
The proof of this Lemma is obtained by concatenation of
the equilibrium of each realized game which has at least one
equilibrium in mixed strategies (Nash theorem).
Definition 3 (State-independent equilibrium). A strategy pro-file (x1, x2) is state-independent equilibrium if x1 and x2 areindependent of the realized value of the state and they forman equilibrium of the expected game.
Lemma 3. There exists at least one state-independent (mixed)equilibrium in the expected RMG.
To prove this Lemma, we apply Nash theorem to the
expected game which is a bi-matrix game.
Example 1 (Ergodic Rate and Channel Uncertainty). Con-sider two mobile stations (MSs) and two small base stations(sBSs). Each mobile station can transmit to one of the basestations. If mobile station MS1 chooses a different basestation than mobile station MS2 then, mobile station MS1
gets the payoff log2(1 + p1|h1|2
N0
), and mobile station 2 gets
log2
(1 + p2|h2|2
N0
)where p1 and p2 are (strictly) positive
transmit powers, h1, h2 are channel states (random), and N0
is a background noise parameter. If both MSs transmit at thesame base station, there is an interference; mobile stationMS1 gets log2
(1 + p1|h1|2
N0+p2|h2|2), and mobile station MS2
gets log2
(1 + p2|h2|2
N0+p1|h1|2). The following table summarizes
the different configurations. Mobile station MS1 chooses arow (row player) and MS2 chooses a column (column player).The first component of the payoff is for MS1 and the secondcomponent is for MS2.
We distinguish four configurations: (i) (h1, h2) �=(0, 0), hj �= 0 (ii) (0, h2), h2 �= 0, (iii) (h1, 0), h1 �= 0,(iv) (0, 0).
Next we give a detailed analysis of the game in eachconfiguration in {1, 2, 3, 4}.
• Configuration one: In the first category, all the param-eters are strictly positive. This is an anticoordination
game. The game has two pure equilibria which consistsof (sBS1, sBS2) or (sBS2, sBS1). There is also onefully mixed equilibrium.
• Configuration two: The game within this category has itsstate in the form of (0, h2), h2 �= 0. These games havea continuum of equilibria. Any strategy of user 1 is anequilibrium strategy.
• Configuration three: The game with category three hasits state in the form (h1, 0), h1 �= 0. These games havecontinuum of equilibria.
• Configuration four: Any strategy profile is an equilibriumand a global optimum in this game.
Now what about the expected game?It has been shown in [8] that, in the expected game of this
RMG example, a large class of distributed strategic learningalgorithms converges to one of the global optima, which is aninteresting property. Moreover the estimated payoffs convergeto global optimum payoffs.
IV. DYNAMIC RANDOM MATRIX GAMES
A. Evolutionary Random Matrix GamesWe consider evolutionary games described by many local
pairwise interactions. In each pairwise interaction, players areinvolved in a RMG. The resulting evolutionary game dynamicsare stochastic dynamics. Using stochastic approximations, theasymptotic pseudotrajectories of these dynamics can be reliedwith ordinary differential equations with the expected payoffs.An example is the replicator dynamics given by
d
dtx1,t(r) = x1,t(r)
⎡⎣(EM1(w)x2,t)r −
∑
r′x1,t(r
′)(EM1(w)x2,t)r′
⎤⎦ (2)
d
dtx2,t(c) = x2,t(c)
⎡⎣(Ex
′1,tM2(w))c −
∑
c′x2,t(c
′)(Ex1,tM2(w))
c′
⎤⎦ (3)
The set of stationary points of this dynamics contains the set
of state-independent Nash equilibria. It contains also the set
of pure global optima of the game.
B. Generic dynamic RMG without state information
We now focus on a long-run RMG without private infor-
mation on the realized state (on both side).
The dynamic game is described as follows.
• Initialization: At time t = 0, Nature generates a state w0
from some distribution (unknown to the players). Each
player j chooses an action aj,0. The action profile a0and the realized state w0 determined a realized payoff
vector (rj,0)j . Each player j observes a noisy value of
her realized payoff rj,0.• At time t, the private history of player j is the collection
of measurements and chosen actions by herself up to t−1.a new state is generated. Each player j chooses an action
her history and past observations (aj,t′ , rj,t′)t′≤t−1. Each
player measures a noisy value of her payoff rj,t.• The game moves to t+ 1.
Each player aims to maximize her average payoff.
A strategy of a player j in the dynamic game is a collection
of mapping. A pure strategy σj,t of player j at time t is a map
of the private history set Hj,t = (Aj × R)t−1
to the action
space Aj . For hj,t ∈ Hj,t, σj,t(hj,t) ∈ Aj . We denote by Σj
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 83
the set of pure strategies of producer j. The average payoff
at time T is Fj,T (σ) =1
T+1
∑Tt=0 rj,t. The dynamic game is
then represented by Gp,T := (W, ν, ,N , (Σj , Fj,T )j∈N ). We
seek for equilibrium of the inferior limit of Gp,T .
C. Learning in Random Matrix GamesLet x1,t+1(r) be the probability of the row player to choose
the row r at time iteration t+1 and x2,t+1(c) be the probability
of the column player to choose the column c at time iteration
t + 1. The parameter λj,t is a positive real number and
represents the learning rate for strategy-dynamics of player
j. μj,t is a positive real number which represents the learning
rate for payoff-dynamics. A convenient class of learning in
RMG is the class of combined fully distributed payoff and
strategy learning (CODIPAS).In the CODIPAS scheme each player tries to learn her
payoff functions as well as the associated optimal strategies.The CODIPAS scheme is well-adapted to RMGs for multi-
ple reasons:
• CODIPAS is designed for random environment.
• It is only based numerical measurements and include
noisy observation, measurement error and outdated mea-
surements.
• The player does not need to know the others actions or
others payoffs.
• In contrast to the classical schemes, there is a no complex
optimization problem to solve from one iteration to
another.
• CODIPAS is flexible enough to incorporate asynchronous
and random updates
The key idea can be summarized in two lines:
(i) The behaviors influence the outcomes.
(ii) The consequences influence the behaviors of the players.
Due to the randomness, each player estimates her payoff. m̂j,t
is the estimated payoff vector of player j at time t. Based on
estimations, player j constructs a strategy for the next iteration.Below we provide one example of CODIPAS that is well-
adapted to RMGs as we will see the numerical investigation
in Section VI.⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
Initializationm̂1,0 = (m̂1,0(1), . . . , m̂1,0(l1))x1,0 = (x1,0(1), . . . , x1,0(l1))m̂2,0 = (m̂2,0(1), . . . , m̂2,0(l2))x2,0 = (x2,0(1), . . . , x2,0(l2))
Define the sequences up to T : λj,t, μj,t
For t ∈ {1, 2, . . . , T}Learning pattern of the row playerx1,t+1(r) =
x1,t(r)(1+λ1,t)m̂1,t(r)
∑r′ x1,t(r′)(1+λ1,t)
m̂1,t(r′)
m̂1,t+1(r) = m̂1,t(r) + μ1,t1l{a1,t=r}(m1,t − m̂1,t(r))
Learning pattern of the column playerx2,t+1(c) =
x2,t(c)(1+λ2,t)m̂2,t(c)
∑c′ x2,t(c′)(1+λ2,t)
m̂2,t(c′)
m̂2,t+1(c) = m̂2,t(c) + μ2,t1l{a2,t=c}(m2,t − m̂2,t(c))Interpretation of the above learning scheme: The payoff-
learning given by m̂t estimates the expected payoffs for each
of the actions. When T is sufficiently large and the actions
sufficiency explored, each user will be able to learn the
expected payoff via these estimations. However a user does
not need to wait till the end of the exploration phase. Each
user adapts its strategy simultaneously with a recent experience
mj,t. Then, the strategy-learning xt is used to learn the
optimal strategies. The strategies are imitative in the sense the
next probabilities are proportional to the previous ones. The
probability of strategies that result in higher average payoff
will be increased and the other probabilities will decrease by
normalization.
Theorem 1. For small learning rate, the strategy-learning ofthe algorithm approximates the replicator equation (2).
Proof: The proof of the statement is simple. We first
observe that the limiting of limλ−→0+(1+λ)r̂−1
λ = r̂. This
allows to compute the drift which is the expected changes in
one-time slot: D1,t(r) =1
λ1tE (x1,t+1(r)− x1,t(r) |xt) . It is
not difficult to see that the asymptotic of D1,t(r) when λ1t
vanishes is given by the righthand side of the replicator equa-
tion (2). Using classical stochastic approximation technique,
the announced statement follows.
Theorem 2 (Asymptotic pseudotrajectory- constant learning
rates). The asymptotic pseudotrajectory for similar rates i.e.λi,t
μj,t−→ kij > 0 of the imitative CODIPAS is given by
⎧⎪⎪⎪⎨⎪⎪⎪⎩
ddtx1,t(r) = k11x1,t(r) [m̂1,t(r)−
∑r′ x1,t(r
′)m̂1,t(r′)]
ddtx2,t(c) = k21x2,t(c) [m̂2,t(c)−
∑c′ x2,t(c
′)m̂2,t(c′)]
ddtm̂1,t(r) = k12x1,t(r)
[(EM1(w)x2,t)r − m̂1,t(r)
]ddtm̂2,t(c) = k22x2,t(c)
[(Ex′
1,tM2(w))c− m̂2,t(c)
]
The proof of this Theorem is obtained by rewriting the
scheme in the form of Robbins-Monro and combining the
result of Theorem 1. Therefore we omit the details.
Remark 1. The scheme proposed here is different than thestochastic fictitious play, logit and Boltzmann-Gibbs learn-ing [8]. The main difference is that The set of stationarypoints of the limiting imitative CODIPAS contains the set ofequilibria and the set of global optima which is not the casefor the stochastic fictitious play, logit/Glauber dynamics andBoltzmann-Gibbs learning.
D. Mean-Variance Response
Consider a RMG with 2 players. The row player has a state-
dependent payoff function given by x′1M(w)x2. We introduce
two key performance metrics: the mean of the payoff and the
variance of the payoff. The objective here is not to consider
the expected payoff as performance but to include also the
variance which captures the notion of risk.
We define the mean-variance response (MVR) of the player
1 to a strategy x2 as follows: x1 ∈ MVR1(x2) if there
is no strategy y1 for which the following inequalities are
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 84
simultaneously true:
Ew∼νy′1M(w)x2 ≥ Ew∼νx
′1M(w)x2, and (4)
Ew [y′1M(w)x2 − y′1Ew∼νM(w)x2)]2
≤ Ew [x′1M(w)x2 − x′
1Ew∼νM(w)x2]2
(5)
where at least one inequality is strict.
Similarly, we define the MVR of the column player as x2 ∈MVR2(x1) if there is no strategy y2 for which the following
inequalities are simultaneously true:
Ew∼νx′1M(w)y2 ≤ Ew∼νx
′1M(w)x2, and (6)
Ew [x′1M(w)y2 − x′
1Ew∼νM(w)y2)]2
≥ Ew [x′1M(w)x2 − x′
1Ew∼νM(w)x2]2
(7)
where at least one inequality is strict.
This formulation can be seen as a Pareto optimality response
to x2 in the sense that one cannot improve one of the payoff
component without degrading the other component. This is
a vector optimization criterion where the first objective is to
be maximized and the second objective (variance) is to be
minimized. In the context of games, one has a fixed point of
MVR. The mean variance solution is a saddle point of the
MVR correspondence.
We now formulate the nonzero-sum case. Mean-variancesolution: We define a mean-variance solution as a fixed point
of the MVR correspondence i.e., a strategy profile x such that
x1 ∈ MVR1(x2) and x2 ∈ MVR1(x1).
E. Satisfactory solution
Consider the two-player zero-sum case. In the satisfactory
criterion of optimality, the row player maximizes her proba-
bility of winning a certain amount no matter what strategy the
other player used. Formally, it is captured by the following
minmax problem
supx1
infx2
P (x′1M(w)x2 ≥ β) , β ∈ R.
V. RANDOM POLY-MATRIX GAMES
We extend the above framework to finitely many interacting
players and random payoffs. The model is similar to what is
called finite robust games. Let X =∏
j∈N Xj , with Xj be a
(lj − 1)-dimensional simplex of Rlj . Denote d :=
∑j∈N lj .
Let Rj(w, x) be the payoff function of player j. The expected
payoff is rj(x) = EwRj(w, x).We say that x is a state-independent equilibrium of the
expected RMG if for all player j ∈ N , EwRj(w, xj , x−j) ≥EwRj(w, yj , x−j), ∀yj ∈ Xj .
A. Variational inequality
The existence of state-independent equilibria is equivalent to
the existence of solution of the following variational inequalityproblem: find x such that
〈x− y, V (x)〉 ≥ 0, ∀y ∈∏j
Xj ,
where 〈., .〉 is the inner product, V (x) = [V1(x), . . . , Vn(x)],
Vj(x) = [EwRj(w, esj ,x−j)]sj∈Aj,
esj is the unit vector with 1 at the position of sj and 0otherwise, sj ∈ Aj .
Note that an equilibrium of the expected RMG may not be
an equilibrium at each time slot. This is because x being an
equilibrium for expected RMG does not imply that x is an
equilibrium of the game G(w) for some state w ∈ W.We aim to approximate a solution to the variational inequal-
ity. Let εRNE(x) =∑
j∈N[maxyj∈Xj
rj(yj , x−j)− rj(x)],
be the sum of improvements that can be obtained by unilateral
deviation.
Lemma 4. (i) εRNE(x) ≥ 0 for any x ∈ X . (ii) If εRNE(x) =0 for a certain x ∈ X , then one gets an equilibrium.
Proof: Suppose xj ∈ Xj . It is clear that[maxyj∈Xj
rj(yj , x−j)− rj(x)] ≥ 0 hence εRNE(x) ≥ 0
for any x ∈ X . It is not difficult to see from the
definition that if εRNE(x) = 0 for a certain x ∈ X ,then ∀j, [
maxyj∈Xjrj(yj , x−j)− rj(x)
]= 0. This means
that x is an equilibrium. Hence a minimizer of εRNE(x) is a
solution to VI.
B. Expected robust potential in RMG
We say that a RMG has an expected robust potential if
the game with expected payoff (expectation with the respect
to the random variable w) is a potential game in the sense
of Monderer and Shapley (1996,[7]). Note that the state-
dependent games need not be potential game. The main feature
of expected robust potential is that pure equilibria seeking
and mixed equilibria seeking are well-suited using imitative
CODIPAS.
VI. NUMERICAL INVESTIGATION
Consider a class of network selection under uncertainty
and investigate the behavior of the players using distributed
strategic learning. The basic setting is a Three-by-Two RMG:
three players and two actions per player in each state. The
main difference with the classical approaches in the litera-
ture is the uncertainty of environment. Here the players are
interacting for resource access in random environment. The
action set of the row player is {r1, r2} and action set of
the column player is {c1, c2}. The third player chooses one
of the two tables (array) {a1, a2}. The payoff entry has two
parts: a deterministic part and the noisy parts given by real-
valued zero-mean random variables njabc. In all the numerical
examples below, we consider these random variables to be
uniformly distributed in [−1/8, 1/8]. We fix the parameter
α = 0.995 The initial configurations of strategies are x1,0 =(0.52, 0.48), x2,0 = (0.5, 0.5), x3,0 = (0.49, 0.51) and the
estimated payoffs are initialized at m̂1,0 = (0.1, 0.1), m̂2,0 =(0.1, 0.1), m̂2,0 = (0.1, 0.1)
In Figure 1 we plot the evolution of probability of choosing
the first action by the users. We observe that the CODIPAS
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 85
1\ 2 c1 c2r1 (0, 0, 0) + (n1
111, n2111, n
3111) (0, α, 0) + (n1
111, n2121, n
3121)
r2 (α, 0, 0) + (n1211, n
2211, n
3211) (0, 0, α) + (n1
221, n2221, n
3221)
1\ 2 c1 c2r1 (0, 0, α) + (n1
112, n2112, n
3112) (α, 0, 0) + (n1
122, n2122, n
3122)
r2 (0, α, 0) + (n1212, n
2212, n
3212) (0, 0, 0) + (n1
22, n2222, n
3222)
TABLE IISTRATEGIC FORM REPRESENTATION FOR 3 PLAYERS - 2 ACTIONS
0 1000 2000 3000 40000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prob
abil
ity
of p
layi
ng t
he f
irst
act
ion
Number of iterations
x1,t(r1)x2,t(c1)x3,t(a1)
Fig. 1. Convergence of the strategies to a global optimum of RMG
converges to a pure global optimum (0, 1, 1) even under mea-
surement noise, which is an interesting robustness property.
We plot the lines 0.05 and 0.995 in order to check the
convergence time within a range of error 5%. As we can
see the strategies of all three users are converged within the
window of length 500. In Figure 2 we plot the evolution of
payoff when choosing the first action by the users. We observe
that user 1′s payoff estimations converges to 1 and the two
others estimation goes to zero for the first action. In Figure
3 we plot in three dimension the evolution of the probability
of choosing the first action by the users in function of the
others. The CODIPAS converges to a corner (0, 1, 1) of the
cube which is pure global optimum of the expected RMG.
Other variations: We have simulated other configurations
and observed that the others global optima can also be obtained
by changing the initial condition. The measurement noise can
have positive effect sometimes in the sense that it convergence
0 1000 2000 3000 40000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Aver
age
of p
erce
ived
pay
offs
Number of iterations
avg.m1,tavg.m2,tavg.m3,t
Fig. 2. Evolution of the average payoff estimations of RMG
00.2
0.40.6
0.2
0.4
0.6
0.8
10.2
0.4
0.6
0.8
1
x1,t
(r1)
Convergence to a pure global optimum by imitative CODIPAS
x2,t
(c1)
x3,t(a1)
Fig. 3. 3D plot: Convergence of the strategies to a global optimum of RMG
time can be faster. We can accelerate the convergence time by
simply adding the scaling growth parameter into the learning
scheme. Heterogeneous scaling helps also to accelerate the
convergence time of both learning schemes (payoff dynamics
and strategy dynamics).
VII. CONCLUDING REMARKS
In this paper we have presented preliminary results of the
emerging area of random matrix games in wireless networks.
Both zero-sum and nonzero-sum games are examined. The
framework is flexible enough to extend to evolutionary RMGs
and distributed strategic learning which allows to learn ex-
pected payoffs, variances and optimal strategies in wide range
of random bi-matrix games.
Acknowledgement: This work was supported in part by theCoDECoM Project, through HIRP- YJCB2010003RE.
REFERENCES
[1] M. Khan, H. Tembine, T. Vasilakos, Evolutionary Coalitional Games:Design and Challenges in Wireless Networks, IEEE Wireless Commu-nications Magazine, Special issue User Cooperation, 19 , 2 pp. 50-56,2012.
[2] V. Srivastava, J. Neel, A. B. MacKenzie, R. Menon, L. A. DaSilva, J.E. Hicks, J. H. Reed, and R. P. Gilles. Using game theory to analyzewireless ad hoc networks. IEEE Communications Surveys and Tutorials,7(4):46-56, 2005.
[3] E. Altman, T. Boulogne, R. El-Azouzi, T. Jimenez, and L. Wynter. Asurvey on networking games in telecommunications. Computers andOperations Research, 2006.
[4] S. H. Low, L. Chen, T. Cui and J. C. Doyle. A game-theoreticmodel for medium access control. IEEE Journal on Selected Areas inCommunications, 26(7), 2008.
[5] G. Kasbekar and A. Proutiere. Opportunistic medium access in mul-tichannel wireless systems: A learning approach. in Proc. of AllertonConference on Communications, Control, and Computing, 2010.
[6] H. Tembine, Dynamic Robust Games in MIMO systems, IEEE Trans-actions on Systems, Man, Cybernetics, Part B, 99, Volume: 41, Issue:4, pp. 990 - 1002, Aug. 2011.
[7] Monderer, D., and Shapley, L.S. 1996, Potential Games, Games andEconomic Behavior, 14, 124-143.
[8] H. Tembine, Distributed strategic learning for wireless engineers, CRCPress, Taylor & Francis, 2012.
2012 IEEE Global High Tech Congress on Electronics
978-1-4673-5085-3/12/$31.00 ©2012 IEEE 86