ria equilib - utrecht university€¦ · equilibria: motivation author: gerard vreeswijk. slides...

Multi-agent learning

Equilibria

Gerard Vreeswijk, Intelligent Systems Group, Computer ScienceDepartment, Faculty of Sciences, Utrecht University, The

Netherlands.

Wednesday 18th February, 2015

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Nash equilibria

a �nite

set of points

Correlated equilibria oarse orrelated

equilibria



■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

a �nite

set of points


equilibria




■ The concept of Nash equilibrium seemstoo narrow and too demanding.

a �nite

set of points


equilibria





For many strategic form games, the set ofNash equilibria indeed consists of a �nite

set of points.


equilibria






set of points.

■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.


equilibria






set of points.


Correlated equilibria

oarse orrelated

equilibria






set of points.


Correlated equilibria, oarse orrelated

equilibria.

Plan for today


Plan for today


1. Some preparation

Plan for today


1. Some preparation

■ Rehearse terminology.

Plan for today


1. Some preparation


■ Redefine Nash equilibrium.

Plan for today


1. Some preparation



■ Probability distributions over the strategy space.

Plan for today


1. Some preparation




2. Correlated equilibrium

Plan for today


1. Some preparation





■ Intuition.

Plan for today


1. Some preparation





■ Intuition.

■ Definition.

Plan for today


1. Some preparation





■ Intuition.

■ Definition.

■ Examples.

Plan for today


1. Some preparation





■ Intuition.

■ Definition.

■ Examples. (Many.)

Plan for today


1. Some preparation





■ Intuition.

■ Definition.


3. Hierarchy of equilibria:

Plan for today


1. Some preparation





■ Intuition.

■ Definition.


3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.

Plan for today


1. Some preparation





■ Intuition.

■ Definition.


3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.

4. Summary

Recap of terminology


Terminology


Terminology


■ Players are denoted by numbers: I = {1, . . . , n}.

Terminology



■ The set of actions available to player i is denoted by Xi.

Terminology




Example: X1 = {left, right, up, down}.

Terminology





■ X = X1 × · · · × Xn is the set of all action profiles.

Terminology





■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

Terminology






■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles.

Terminology






■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

Terminology







■ ui : X → R is the utility function of player i.

Terminology








■ Si = ∆(Xi) is the set of all strategies available to player i.

Terminology








■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)

Terminology









■ S = S1 × · · · × Sn is the set of all possible strategy profiles.

Terminology









■ S = S1 × · · · × Sn is the set of all possible strategy profiles.

■ Profile s is sometimes written as s = (si, s−i), where s−i is si’scounterprofile.

Terminology


expe ted utility

Terminology


■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s

expe ted utility

Terminology


■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

expe ted utility

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s

expe ted utility

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

expe ted utility

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).


ui(s) =Def ∑x

s(x)ui(x).

■ Summary:

expe ted utility

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).


ui(s) =Def ∑x

s(x)ui(x).

■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:

ui : S → R :

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).


ui(s) =Def ∑x

s(x)ui(x).


ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).


ui(s) =Def ∑x

s(x)ui(x).


ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

∑x s(x) ui(x)

Terminology



s(x) =Def s1(x1)× · · · × sn(xn).


ui(s) =Def ∑x

s(x)ui(x).


ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

∑x s(x) ui(x)ui(s)

Example


Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:


■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L)

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1

Example



U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:



■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1

= 0.52.

Nash equilibria defined

in terms of pure strategies


Best response


Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

support arrier

Best response




A best response is not necessarily unique.

support arrier

Best response




A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

support arrier

Best response





■ If two or more pure actions are best responses, any mix of them alsois a best response.

support arrier

Best response






■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them.

Best response






■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)

Best response







■ Therefore, any mix of these actions must also be a best response.

Best response







■ Therefore, any mix of these actions must also be a best response.

■ Mix, e.g., (0, 0, 0, 1, 0, 0) ⇒ there is always a pure best response.

Nash equilibrium


All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

pure

Nash equilibrium





A “pure action way” to define a NE:

pure

Nash equilibrium





A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi

Nash equilibrium





A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:

For all players i and alternative actions x′i :

Nash equilibrium





A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:

For all players i and alternative actions x′i :

∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ∑x−i

s−i(x−i)ui(xi, x−i).

Probability distributions

over the strategy space


Strategies ⇒ strategy profile ⇒ same strategies




■ Suppose n players, strategies s1, . . . , sn are given:

s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi

m pm pmq1 pmq2 . . . pmqn

,




s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi


,

where n is the number of different counter-profiles.




s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi


,


■ Players act independently.




s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi


,


■ Players act independently.

■ The strategy si = (p1, . . . , pm) and the counter strategy profiles−i = (q1, . . . , qn) together define a product distribution s ∈ ∆(X):

s(x1, . . . , xn) =Def s(x1)× · · · × s(xn).

Distribution on X ⇒ strategies ⇒ strategy profile


Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.

q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn




q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn

■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)




q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn


■ The marginals form strategies: si = qi, s−i = q−i.




q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn


■ The marginals form strategies: si = qi, s−i = q−i.

■ But now generallys(xi, x−i) 6= s(xi)s(x−i).

Joint distribution vs. joint strategy profile


Example.

So this is possible:L (0.2) R (0.8)

U (0.6)D (0.4)



Example.


U (0.6) 0.12D (0.4)



Example.


U (0.6) 0.12 0.48D (0.4)



Example.


U (0.6) 0.12 0.48D (0.4) 0.08



Example.


U (0.6) 0.12 0.48D (0.4) 0.08 0.32



Example.


U (0.6) 0.12 0.48D (0.4) 0.08 0.32

We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).



Example.


U (0.6) 0.12 0.48D (0.4) 0.08 0.32


But this is not:L R

U 0.13 0.47D 0.07 0.33



Example.


U (0.6) 0.12 0.48D (0.4) 0.08 0.32


But this is not:L (??) R (??)

U (??) 0.13 0.47D (??) 0.07 0.33



Example.


U (0.6) 0.12 0.48D (0.4) 0.08 0.32


But this is not:L (??) R (??)

U (??) 0.13 0.47D (??) 0.07 0.33

The are no marginal distributions.

Correlated equilibrium


Correlated equilibrium (Intuition)


Chicken gameOther:

You: Straight Swerve

Straight (− 10,−10) (5, 0)Swerve (0, 5) (−1,−1)

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))


Correlated equilibrium (Idea). Let

probability distribution

q : X → [0, 1]

be given. This q can be seen as acoordinating device.

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))




q : X → [0, 1]


Think of a traffic light:

You:



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))




q : X → [0, 1]



q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05



Chicken gameOther:


Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)


■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))




q : X → [0, 1]



q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

Each time, the system is in one ofthese four states.

Correlated equilibrium (Definition)


q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

orrelated equilibrium



q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

■ With marginal probability, q, the system is in each of these four statesx ∈ X.




q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


■ Players know q.




q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


■ Players know q.

■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.




q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


■ Players know q.

■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.

Definition. A distribution q ∈ ∆(X) is called a orrelated equilibrium ifno party has ever an incentive to deviate from its own coordinate xi,assuming that others do not deviate from x−i as well.


(Formula)




Idea:



Idea:

Suppose q ∈ ∆(X) is given.



Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q.



Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q.



Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.



Idea:


Now, in a CE, no one wants to change:



Idea:



For all i, xi and x′i :



Idea:




∑x−i

q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i

q(x−i|xi)ui(xi, x−i).



Idea:




∑x−i



Multiplying by q(xi) gives, for all i, xi and x′i :



Idea:




∑x−i




∑x−i

q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i

q(xi, x−i)ui(xi, x−i).



Idea:




∑x−i




∑x−i

q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i

q(xi, x−i)ui(xi, x−i).

The latter is often used as the formula to verify a CE.

How to verify a

correlated equilibrium


To verify a correlated equilibrium


We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

You:



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)


Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)


Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1

■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)


Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1


Red :0.40

0.450 +

0.05

0.45(−1) = −0.11

Green :0.40

0.45(−10) +

0.05

0.455 = −8.35



We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05


Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)


Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1


Red :0.40

0.450 +

0.05

0.45(−1) = −0.11

Green :0.40

0.45(−10) +

0.05

0.455 = −8.35

■ (5 + (−0.11))/2 = 2.45 >

payoffs from two out of threeNE.

The problem to find all

correlated equilibria


Find all correlated equilibria


Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

You:




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:

■ u1(act like G | signal G) ≥u1(act like R | signal G).




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:


■ u1(act like R | signal R) ≥u1(act like G | signal R).




Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:


■ u1(act like R | signal R) ≥u1(act like G | signal R).

■ Similarly for u2 (the columnplayer).



u1(act like G | signal G)



u1(act like G | signal G) ≥ u1(act like R | signal G)




α

α + β(−10) +

β

α + β5




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R)




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ ≥ 0.




α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,


γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ ≥ 0.

Similarly for u2 (the column player).



We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0



We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

This is a solid convex polyhedron in R3:



We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0


⇔

α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0

5β − 3(1 − α − β − γ)≥ 0



We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0


⇔

α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0

5β − 3(1 − α − β − γ)≥ 0

⇔

α ≥ 0, β ≥ 0, γ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

3α + 3β + 8γ≥ 3−5α + 3γ≥ 0

3α + 8β + 3γ≥ 3.

Correlated equilibria


Admissible values for α, β and γ in the traffic light problem:

0.00.1

0.2

0.0

0.5

1.0

0.0

0.5

1.0

Find specific correlated equilibria




What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?




Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0




Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:




Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

0,3

11,

3

11,

5

11

)

.




Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

0,3

11,

3

11,

5

11

)

.

Answer: at most 5/11 = 45% of the time.



Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .

Answer: yes




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .

Answer: yes, but the column driver then has to be given free way all ofthe time.



Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.

Answer: no.




Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.

Answer: no. To maintain an equilibrium, the row driver has to give way15/98 ≈ 15% of the time.

Hierarchy of equilibria


Nash equilibrium ⇒ correlated equilibrium


If strategies are independent, we have

s−i(x−i|xi) = s−i(x−i)





Immediately,

for all i and x′i : ∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)

⇒ for all xi, i and x′i : ∑x−i

s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)

The latter is the conditional formulation of a correlated equilibrium.





Immediately,

for all i and x′i : ∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)

⇒ for all xi, i and x′i : ∑x−i

s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)

The latter is the conditional formulation of a correlated equilibrium.Therefore, every Nash equilibrium is a correlated equilibrium.

Summary


NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

Summary






■ With CE and CCE there are no individual strategies.

Summary







■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

Summary








■ The formulas for Nash and CCE are identical.

Summary








■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies.

Summary








■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

Summary









■ The LHS of the CCE is the xi-sum over all LHS’s of the CE.

Summary









■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.

Summary










■ We already derived NE ⇒ CE.

Summary










■ We already derived NE ⇒ CE. Therefore, NE ⇒ CE ⇒ CCE.

ria equilib - utrecht university€¦ · equilibria: motivation author: gerard vreeswijk. slides...

Documents