ria equilib - utrecht university€¦ · equilibria: motivation author: gerard vreeswijk. slides...

188
Multi-agent learning Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Wednesday 18 th February, 2015

Upload: dominh

Post on 08-Oct-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Multi-agent learning

Equilibria

Gerard Vreeswijk, Intelligent Systems Group, Computer ScienceDepartment, Faculty of Sciences, Utrecht University, The

Netherlands.

Wednesday 18th February, 2015

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Nash equilibria

a �nite

set of points

Correlated equilibria oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

a �nite

set of points

Correlated equilibria oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

■ The concept of Nash equilibrium seemstoo narrow and too demanding.

a �nite

set of points

Correlated equilibria oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

■ The concept of Nash equilibrium seemstoo narrow and too demanding.

For many strategic form games, the set ofNash equilibria indeed consists of a �nite

set of points.

Correlated equilibria oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

■ The concept of Nash equilibrium seemstoo narrow and too demanding.

For many strategic form games, the set ofNash equilibria indeed consists of a �nite

set of points.

■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.

Correlated equilibria oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

■ The concept of Nash equilibrium seemstoo narrow and too demanding.

For many strategic form games, the set ofNash equilibria indeed consists of a �nite

set of points.

■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.

Correlated equilibria

oarse orrelated

equilibria

Equilibria: motivation

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.

■ The concept of Nash equilibrium seemstoo narrow and too demanding.

For many strategic form games, the set ofNash equilibria indeed consists of a �nite

set of points.

■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.

Correlated equilibria, oarse orrelated

equilibria.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

■ Examples.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

■ Examples. (Many.)

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

■ Examples. (Many.)

3. Hierarchy of equilibria:

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

■ Examples. (Many.)

3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.

Plan for today

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

1. Some preparation

■ Rehearse terminology.

■ Redefine Nash equilibrium.

■ Probability distributions over the strategy space.

2. Correlated equilibrium

■ Intuition.

■ Definition.

■ Examples. (Many.)

3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.

4. Summary

Recap of terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

■ ui : X → R is the utility function of player i.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

■ ui : X → R is the utility function of player i.

■ Si = ∆(Xi) is the set of all strategies available to player i.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

■ ui : X → R is the utility function of player i.

■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

■ ui : X → R is the utility function of player i.

■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)

■ S = S1 × · · · × Sn is the set of all possible strategy profiles.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Players are denoted by numbers: I = {1, . . . , n}.

■ The set of actions available to player i is denoted by Xi.

Example: X1 = {left, right, up, down}.

■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)

■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)

■ ui : X → R is the utility function of player i.

■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)

■ S = S1 × · · · × Sn is the set of all possible strategy profiles.

■ Profile s is sometimes written as s = (si, s−i), where s−i is si’scounterprofile.

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

■ Summary:

expe ted utility

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:

ui : S → R :

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:

ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:

ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

∑x s(x) ui(x)

Terminology

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:

s(x) =Def s1(x1)× · · · × sn(xn).

■ Define ui(s) as player i’s utility when players follow strategy profile s:

ui(s) =Def ∑x

s(x)ui(x).

■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:

ui : S → R : s 7→ ∑x

[

s1(x1)× · · · × sn(xn)]

ui(x)

∑x s(x) ui(x)ui(s)

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L)

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1

Example

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Battle of the sexes:L (0.2) R (0.8)

U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)

Then:

■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.

■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.

■ u1(s) = ∑x

[

s1(x1)s2(x2)]

u1(x)

= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)

+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)

= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭

0.6 × 0.8 × 0 +✭✭✭✭✭✭✭

0.4 × 0.2 × 0 + 0.4 × 0.8 × 1

= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1

= 0.52.

Nash equilibria defined

in terms of pure strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

support arrier

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique.

support arrier

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

support arrier

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

■ If two or more pure actions are best responses, any mix of them alsois a best response.

support arrier

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

■ If two or more pure actions are best responses, any mix of them alsois a best response.

■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them.

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

■ If two or more pure actions are best responses, any mix of them alsois a best response.

■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

■ If two or more pure actions are best responses, any mix of them alsois a best response.

■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)

■ Therefore, any mix of these actions must also be a best response.

Best response

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Definition (Best response). Strategy si is a best response to the counter-profile s−i if

for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).

A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.

■ If two or more pure actions are best responses, any mix of them alsois a best response.

■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)

■ Therefore, any mix of these actions must also be a best response.

■ Mix, e.g., (0, 0, 0, 1, 0, 0) ⇒ there is always a pure best response.

Nash equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

pure

Nash equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

A “pure action way” to define a NE:

pure

Nash equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi

Nash equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:

For all players i and alternative actions x′i :

Nash equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.

Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:

for all i : si ∈ B(s−i).

A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:

For all players i and alternative actions x′i :

∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ∑x−i

s−i(x−i)ui(xi, x−i).

Probability distributions

over the strategy space

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Strategies ⇒ strategy profile ⇒ same strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Strategies ⇒ strategy profile ⇒ same strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Suppose n players, strategies s1, . . . , sn are given:

s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi

m pm pmq1 pmq2 . . . pmqn

,

Strategies ⇒ strategy profile ⇒ same strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Suppose n players, strategies s1, . . . , sn are given:

s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi

m pm pmq1 pmq2 . . . pmqn

,

where n is the number of different counter-profiles.

Strategies ⇒ strategy profile ⇒ same strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Suppose n players, strategies s1, . . . , sn are given:

s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi

m pm pmq1 pmq2 . . . pmqn

,

where n is the number of different counter-profiles.

■ Players act independently.

Strategies ⇒ strategy profile ⇒ same strategies

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

■ Suppose n players, strategies s1, . . . , sn are given:

s−i y−i1 y−i

2 . . . y−in

si q1 q2 . . . qn

xi1 p1 p1q1 p1q2 . . . p1qn

xi2 p2 p2q1 p2q2 . . . p2qn...

......

.... . .

...xi

m pm pmq1 pmq2 . . . pmqn

,

where n is the number of different counter-profiles.

■ Players act independently.

■ The strategy si = (p1, . . . , pm) and the counter strategy profiles−i = (q1, . . . , qn) together define a product distribution s ∈ ∆(X):

s(x1, . . . , xn) =Def s(x1)× · · · × s(xn).

Distribution on X ⇒ strategies ⇒ strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.

q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn

Distribution on X ⇒ strategies ⇒ strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.

q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn

■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)

Distribution on X ⇒ strategies ⇒ strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.

q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn

■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)

■ The marginals form strategies: si = qi, s−i = q−i.

Distribution on X ⇒ strategies ⇒ strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.

q−i y−i1 y−i

2 . . . y−in

qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn

xi1 q11 · · · q1n q11 q12 . . . q1n

xi2 q21 · · · q2n q21 q22 . . . q2n...

......

.... . .

...xi

m qm1 · · · qmn qm1 qm2 . . . qmn

■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)

■ The marginals form strategies: si = qi, s−i = q−i.

■ But now generallys(xi, x−i) 6= s(xi)s(x−i).

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6)D (0.4)

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12D (0.4)

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4)

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08 0.32

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08 0.32

We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08 0.32

We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).

But this is not:L R

U 0.13 0.47D 0.07 0.33

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08 0.32

We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).

But this is not:L (??) R (??)

U (??) 0.13 0.47D (??) 0.07 0.33

Joint distribution vs. joint strategy profile

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Example.

So this is possible:L (0.2) R (0.8)

U (0.6) 0.12 0.48D (0.4) 0.08 0.32

We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).

But this is not:L (??) R (??)

U (??) 0.13 0.47D (??) 0.07 0.33

The are no marginal distributions.

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (− 10,−10) (5, 0)Swerve (0, 5) (−1,−1)

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

Correlated equilibrium (Idea). Let

probability distribution

q : X → [0, 1]

be given. This q can be seen as acoordinating device.

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

Correlated equilibrium (Idea). Let

probability distribution

q : X → [0, 1]

be given. This q can be seen as acoordinating device.

Think of a traffic light:

You:

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

Correlated equilibrium (Idea). Let

probability distribution

q : X → [0, 1]

be given. This q can be seen as acoordinating device.

Think of a traffic light:

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

Correlated equilibrium (Intuition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Chicken gameOther:

You: Straight Swerve

Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)

Three Nash equilibria:

■ ((1, 0), (0, 1))

■ ((0, 1), (1, 0))

■ ((3/8, 5/8), (3/8, 5/8))

Expected payoff −5/8 for both inthe last equilibrium.

Correlated equilibrium (Idea). Let

probability distribution

q : X → [0, 1]

be given. This q can be seen as acoordinating device.

Think of a traffic light:

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

Each time, the system is in one ofthese four states.

Correlated equilibrium (Definition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

orrelated equilibrium

Correlated equilibrium (Definition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

■ With marginal probability, q, the system is in each of these four statesx ∈ X.

orrelated equilibrium

Correlated equilibrium (Definition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

■ With marginal probability, q, the system is in each of these four statesx ∈ X.

■ Players know q.

orrelated equilibrium

Correlated equilibrium (Definition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

■ With marginal probability, q, the system is in each of these four statesx ∈ X.

■ Players know q.

■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.

orrelated equilibrium

Correlated equilibrium (Definition)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

■ With marginal probability, q, the system is in each of these four statesx ∈ X.

■ Players know q.

■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.

Definition. A distribution q ∈ ∆(X) is called a orrelated equilibrium ifno party has ever an incentive to deviate from its own coordinate xi,assuming that others do not deviate from x−i as well.

Correlated equilibrium

(Formula)

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given.

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q.

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q.

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

For all i, xi and x′i :

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

For all i, xi and x′i :

∑x−i

q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i

q(x−i|xi)ui(xi, x−i).

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

For all i, xi and x′i :

∑x−i

q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i

q(x−i|xi)ui(xi, x−i).

Multiplying by q(xi) gives, for all i, xi and x′i :

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

For all i, xi and x′i :

∑x−i

q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i

q(x−i|xi)ui(xi, x−i).

Multiplying by q(xi) gives, for all i, xi and x′i :

∑x−i

q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i

q(xi, x−i)ui(xi, x−i).

Correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Idea:

Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.

Now, in a CE, no one wants to change:

For all i, xi and x′i :

∑x−i

q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i

q(x−i|xi)ui(xi, x−i).

Multiplying by q(xi) gives, for all i, xi and x′i :

∑x−i

q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i

q(xi, x−i)ui(xi, x−i).

The latter is often used as the formula to verify a CE.

How to verify a

correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

You:

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?

Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?

Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1

■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?

Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1

■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?

Red :0.40

0.450 +

0.05

0.45(−1) = −0.11

Green :0.40

0.45(−10) +

0.05

0.455 = −8.35

To verify a correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We will show that

q =

Other:

You: Green Red

Green 0.00 0.55Red 0.40 0.05

is a correlated equilibrium of

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?

Green :0

0.55(−10) +

0.55

0.555 = 5

Red :0

0.550 +

0.55

0.55(−1) = −1

■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?

Red :0.40

0.450 +

0.05

0.45(−1) = −0.11

Green :0.40

0.45(−10) +

0.05

0.455 = −8.35

■ (5 + (−0.11))/2 = 2.45 >

payoffs from two out of threeNE.

The problem to find all

correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

You:

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:

■ u1(act like G | signal G) ≥u1(act like R | signal G).

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:

■ u1(act like G | signal G) ≥u1(act like R | signal G).

■ u1(act like R | signal R) ≥u1(act like G | signal R).

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Problem: find all correlatedequilibria for

Other:

You: Green Red

Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)

Solution: set

q =

Other:

You: Green Red

Green α β

Red γ δ

Of course, first:

■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0

■ α + β + γ + δ = 1

But also:

■ u1(act like G | signal G) ≥u1(act like R | signal G).

■ u1(act like R | signal R) ≥u1(act like G | signal R).

■ Similarly for u2 (the columnplayer).

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G)

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R)

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ ≥ 0.

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

u1(act like G | signal G) ≥ u1(act like R | signal G)

α

α + β(−10) +

β

α + β5 ≥

α

α + β0 +

β

α + β− 1

−10α + 5β ≥ 0α +−1β

−5α + 3β ≥ 0.

Further,

u1(act like R | signal R) ≥ u1(act like G | signal R)

γ

γ + δ0 +

δ

γ + δ− 1 ≥

γ

γ + δ(−10) +

δ

γ + δ5

0γ +−1δ ≥ −10γ + 5δ

5γ − 3δ ≥ 0.

Similarly for u2 (the column player).

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

This is a solid convex polyhedron in R3:

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

This is a solid convex polyhedron in R3:

α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0

5β − 3(1 − α − β − γ)≥ 0

Find all correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

We end up with:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

This is a solid convex polyhedron in R3:

α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0

5β − 3(1 − α − β − γ)≥ 0

α ≥ 0, β ≥ 0, γ≥ 0α + β + γ≤ 1−5α + 3β≥ 0

3α + 3β + 8γ≥ 3−5α + 3γ≥ 0

3α + 8β + 3γ≥ 3.

Correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Admissible values for α, β and γ in the traffic light problem:

0.00.1

0.2

0.0

0.5

1.0

0.0

0.5

1.0

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?

Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?

Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?

Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

0,3

11,

3

11,

5

11

)

.

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?

Maximize: δ

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

0,3

11,

3

11,

5

11

)

.

Answer: at most 5/11 = 45% of the time.

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .

Answer: yes

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:(α, β, γ, δ) = (0, 0, 1, 0) .

Answer: yes, but the column driver then has to be given free way all ofthe time.

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.

Answer: no.

Find specific correlated equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?

Minimize: β

Subject to:

α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0

−5α + 3β≥ 0 5β − 3δ≥ 0

Gives:

(α, β, γ, δ) =

(

9

98,

15

98,

1

2,

25

98

)

.

Answer: no. To maintain an equilibrium, the row driver has to give way15/98 ≈ 15% of the time.

Hierarchy of equilibria

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

Nash equilibrium ⇒ correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

If strategies are independent, we have

s−i(x−i|xi) = s−i(x−i)

Nash equilibrium ⇒ correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

If strategies are independent, we have

s−i(x−i|xi) = s−i(x−i)

Immediately,

for all i and x′i : ∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)

⇒ for all xi, i and x′i : ∑x−i

s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)

The latter is the conditional formulation of a correlated equilibrium.

Nash equilibrium ⇒ correlated equilibrium

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

If strategies are independent, we have

s−i(x−i|xi) = s−i(x−i)

Immediately,

for all i and x′i : ∑x−i

s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)

⇒ for all xi, i and x′i : ∑x−i

s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)

The latter is the conditional formulation of a correlated equilibrium.Therefore, every Nash equilibrium is a correlated equilibrium.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

■ The LHS of the CCE is the xi-sum over all LHS’s of the CE.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.

■ We already derived NE ⇒ CE.

Summary

Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria

NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)

CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i

q(xi, x−i)ui(xi, x−i)

CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)

■ With CE and CCE there are no individual strategies.

■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!

■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.

■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.

■ We already derived NE ⇒ CE. Therefore, NE ⇒ CE ⇒ CCE.