ria equilib - utrecht university€¦ · equilibria: motivation author: gerard vreeswijk. slides...
TRANSCRIPT
Multi-agent learning
Equilibria
Gerard Vreeswijk, Intelligent Systems Group, Computer ScienceDepartment, Faculty of Sciences, Utrecht University, The
Netherlands.
Wednesday 18th February, 2015
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Nash equilibria
a �nite
set of points
Correlated equilibria oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
a �nite
set of points
Correlated equilibria oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
■ The concept of Nash equilibrium seemstoo narrow and too demanding.
a �nite
set of points
Correlated equilibria oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
■ The concept of Nash equilibrium seemstoo narrow and too demanding.
For many strategic form games, the set ofNash equilibria indeed consists of a �nite
set of points.
Correlated equilibria oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
■ The concept of Nash equilibrium seemstoo narrow and too demanding.
For many strategic form games, the set ofNash equilibria indeed consists of a �nite
set of points.
■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.
Correlated equilibria oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
■ The concept of Nash equilibrium seemstoo narrow and too demanding.
For many strategic form games, the set ofNash equilibria indeed consists of a �nite
set of points.
■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.
Correlated equilibria
oarse orrelated
equilibria
Equilibria: motivation
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ If players form strategies throughlearning, their strategies generally do notconverge to Nash equilibria.
■ The concept of Nash equilibrium seemstoo narrow and too demanding.
For many strategic form games, the set ofNash equilibria indeed consists of a �nite
set of points.
■ If players form strategies throughlearning their strategies generally,however, do converge to more generaltypes of equilibria.
Correlated equilibria, oarse orrelated
equilibria.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
■ Examples.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
■ Examples. (Many.)
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
■ Examples. (Many.)
3. Hierarchy of equilibria:
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
■ Examples. (Many.)
3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
1. Some preparation
■ Rehearse terminology.
■ Redefine Nash equilibrium.
■ Probability distributions over the strategy space.
2. Correlated equilibrium
■ Intuition.
■ Definition.
■ Examples. (Many.)
3. Hierarchy of equilibria: NE ⇒ CE ⇒ CCE.
4. Summary
Recap of terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
■ ui : X → R is the utility function of player i.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
■ ui : X → R is the utility function of player i.
■ Si = ∆(Xi) is the set of all strategies available to player i.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
■ ui : X → R is the utility function of player i.
■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
■ ui : X → R is the utility function of player i.
■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)
■ S = S1 × · · · × Sn is the set of all possible strategy profiles.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Players are denoted by numbers: I = {1, . . . , n}.
■ The set of actions available to player i is denoted by Xi.
Example: X1 = {left, right, up, down}.
■ X = X1 × · · · × Xn is the set of all action profiles. (Typical: x, x′, . . . .)
■ X−i = X1 × · · · × Xi−1 × Xi+1 × · · · × Xn is the set of allcounterprofiles. (Typical elements: x−i, . . . .)
■ ui : X → R is the utility function of player i.
■ Si = ∆(Xi) is the set of all strategies available to player i. (Typicalelements: si, . . . .)
■ S = S1 × · · · × Sn is the set of all possible strategy profiles.
■ Profile s is sometimes written as s = (si, s−i), where s−i is si’scounterprofile.
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
■ Summary:
expe ted utility
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:
ui : S → R :
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:
ui : S → R : s 7→ ∑x
[
s1(x1)× · · · × sn(xn)]
ui(x)
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:
ui : S → R : s 7→ ∑x
[
s1(x1)× · · · × sn(xn)]
ui(x)
∑x s(x) ui(x)
Terminology
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Define s(x) as the probability that action profile x is played whenplayers follow strategy profile s:
s(x) =Def s1(x1)× · · · × sn(xn).
■ Define ui(s) as player i’s utility when players follow strategy profile s:
ui(s) =Def ∑x
s(x)ui(x).
■ Summary: the expe ted utility u of a strategy profile s for player i canbe expressed as:
ui : S → R : s 7→ ∑x
[
s1(x1)× · · · × sn(xn)]
ui(x)
∑x s(x) ui(x)ui(s)
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
= s1(U)s2(L)u1(U, L)
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)
+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)
+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)
= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭
0.6 × 0.8 × 0 +✭✭✭✭✭✭✭
0.4 × 0.2 × 0 + 0.4 × 0.8 × 1
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)
+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)
= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭
0.6 × 0.8 × 0 +✭✭✭✭✭✭✭
0.4 × 0.2 × 0 + 0.4 × 0.8 × 1
= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1
Example
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Battle of the sexes:L (0.2) R (0.8)
U (0.6) (2, 1) (0, 0)D (0.4) (0, 0) (2, 1)
Then:
■ All action profiles: X = {(U, L), (U, R), (D, L), (D, R)}.
■ The current strategy profile: s = ((0.2, 0.8), (0.6, 0.4)) ∈ S.
■ u1(s) = ∑x
[
s1(x1)s2(x2)]
u1(x)
= s1(U)s2(L)u1(U, L) + s1(U)s2(R)u1(U, R)
+ s1(D)s2(L)u1(D, L) + s1(D)s2(R)u1(D, R)
= 0.6 × 0.2 × 2 +✭✭✭✭✭✭✭
0.6 × 0.8 × 0 +✭✭✭✭✭✭✭
0.4 × 0.2 × 0 + 0.4 × 0.8 × 1
= 0.6 × 0.2 × 2 + 0.4 × 0.8 × 1
= 0.52.
Nash equilibria defined
in terms of pure strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
support arrier
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique.
support arrier
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
support arrier
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
■ If two or more pure actions are best responses, any mix of them alsois a best response.
support arrier
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
■ If two or more pure actions are best responses, any mix of them alsois a best response.
■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them.
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
■ If two or more pure actions are best responses, any mix of them alsois a best response.
■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
■ If two or more pure actions are best responses, any mix of them alsois a best response.
■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)
■ Therefore, any mix of these actions must also be a best response.
Best response
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Definition (Best response). Strategy si is a best response to the counter-profile s−i if
for all s′i ∈ Si : ui(s′i, s−i) ≤ ui(si, s−i).
A best response is not necessarily unique. Let B(s−i) be the set of bestresponses to s−i.
■ If two or more pure actions are best responses, any mix of them alsois a best response.
■ When the support (or arrier) of a best response includes two or moreactions, the agent must be indifferent among them. (If not, then putall weight on the best action.)
■ Therefore, any mix of these actions must also be a best response.
■ Mix, e.g., (0, 0, 0, 1, 0, 0) ⇒ there is always a pure best response.
Nash equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.
Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:
for all i : si ∈ B(s−i).
pure
Nash equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.
Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:
for all i : si ∈ B(s−i).
A “pure action way” to define a NE:
pure
Nash equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.
Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:
for all i : si ∈ B(s−i).
A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi
Nash equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.
Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:
for all i : si ∈ B(s−i).
A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:
For all players i and alternative actions x′i :
Nash equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
All i maintain some strategy si. The strategy profile s is a Nashequilibrium if no one can profit by changing si unilaterally.
Definition (Nash equilibrium). A strategy profile s is a Nash equilib-rium if all strategies in it are best responses:
for all i : si ∈ B(s−i).
A “pure action way” to define a NE: No alternative action x′i 6= xi can dobetter than a pure best response xi:
For all players i and alternative actions x′i :
∑x−i
s−i(x−i)ui(x′i , x−i) ≤ ∑x−i
s−i(x−i)ui(xi, x−i).
Probability distributions
over the strategy space
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Strategies ⇒ strategy profile ⇒ same strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Strategies ⇒ strategy profile ⇒ same strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Suppose n players, strategies s1, . . . , sn are given:
s−i y−i1 y−i
2 . . . y−in
si q1 q2 . . . qn
xi1 p1 p1q1 p1q2 . . . p1qn
xi2 p2 p2q1 p2q2 . . . p2qn...
......
.... . .
...xi
m pm pmq1 pmq2 . . . pmqn
,
Strategies ⇒ strategy profile ⇒ same strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Suppose n players, strategies s1, . . . , sn are given:
s−i y−i1 y−i
2 . . . y−in
si q1 q2 . . . qn
xi1 p1 p1q1 p1q2 . . . p1qn
xi2 p2 p2q1 p2q2 . . . p2qn...
......
.... . .
...xi
m pm pmq1 pmq2 . . . pmqn
,
where n is the number of different counter-profiles.
Strategies ⇒ strategy profile ⇒ same strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Suppose n players, strategies s1, . . . , sn are given:
s−i y−i1 y−i
2 . . . y−in
si q1 q2 . . . qn
xi1 p1 p1q1 p1q2 . . . p1qn
xi2 p2 p2q1 p2q2 . . . p2qn...
......
.... . .
...xi
m pm pmq1 pmq2 . . . pmqn
,
where n is the number of different counter-profiles.
■ Players act independently.
Strategies ⇒ strategy profile ⇒ same strategies
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
■ Suppose n players, strategies s1, . . . , sn are given:
s−i y−i1 y−i
2 . . . y−in
si q1 q2 . . . qn
xi1 p1 p1q1 p1q2 . . . p1qn
xi2 p2 p2q1 p2q2 . . . p2qn...
......
.... . .
...xi
m pm pmq1 pmq2 . . . pmqn
,
where n is the number of different counter-profiles.
■ Players act independently.
■ The strategy si = (p1, . . . , pm) and the counter strategy profiles−i = (q1, . . . , qn) together define a product distribution s ∈ ∆(X):
s(x1, . . . , xn) =Def s(x1)× · · · × s(xn).
Distribution on X ⇒ strategies ⇒ strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.
q−i y−i1 y−i
2 . . . y−in
qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn
xi1 q11 · · · q1n q11 q12 . . . q1n
xi2 q21 · · · q2n q21 q22 . . . q2n...
......
.... . .
...xi
m qm1 · · · qmn qm1 qm2 . . . qmn
Distribution on X ⇒ strategies ⇒ strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.
q−i y−i1 y−i
2 . . . y−in
qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn
xi1 q11 · · · q1n q11 q12 . . . q1n
xi2 q21 · · · q2n q21 q22 . . . q2n...
......
.... . .
...xi
m qm1 · · · qmn qm1 qm2 . . . qmn
■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)
Distribution on X ⇒ strategies ⇒ strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.
q−i y−i1 y−i
2 . . . y−in
qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn
xi1 q11 · · · q1n q11 q12 . . . q1n
xi2 q21 · · · q2n q21 q22 . . . q2n...
......
.... . .
...xi
m qm1 · · · qmn qm1 qm2 . . . qmn
■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)
■ The marginals form strategies: si = qi, s−i = q−i.
Distribution on X ⇒ strategies ⇒ strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Suppose a (possibly non-product) distribution q ∈ ∆(X) is given.
q−i y−i1 y−i
2 . . . y−in
qi q11 · · · qm1 q12 · · · qm2 . . . q1n · · · qmn
xi1 q11 · · · q1n q11 q12 . . . q1n
xi2 q21 · · · q2n q21 q22 . . . q2n...
......
.... . .
...xi
m qm1 · · · qmn qm1 qm2 . . . qmn
■ If players follow q, they need not act independently. (Example:off-diagonal is zero.)
■ The marginals form strategies: si = qi, s−i = q−i.
■ But now generallys(xi, x−i) 6= s(xi)s(x−i).
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6)D (0.4)
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12D (0.4)
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4)
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08 0.32
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08 0.32
We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08 0.32
We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).
But this is not:L R
U 0.13 0.47D 0.07 0.33
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08 0.32
We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).
But this is not:L (??) R (??)
U (??) 0.13 0.47D (??) 0.07 0.33
Joint distribution vs. joint strategy profile
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Example.
So this is possible:L (0.2) R (0.8)
U (0.6) 0.12 0.48D (0.4) 0.08 0.32
We now have a joint distribution q = (0.12, 0.48, 0.08, 0.32).
But this is not:L (??) R (??)
U (??) 0.13 0.47D (??) 0.07 0.33
The are no marginal distributions.
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (− 10,−10) (5, 0)Swerve (0, 5) (−1,−1)
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
Expected payoff −5/8 for both inthe last equilibrium.
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
Expected payoff −5/8 for both inthe last equilibrium.
Correlated equilibrium (Idea). Let
probability distribution
q : X → [0, 1]
be given. This q can be seen as acoordinating device.
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
Expected payoff −5/8 for both inthe last equilibrium.
Correlated equilibrium (Idea). Let
probability distribution
q : X → [0, 1]
be given. This q can be seen as acoordinating device.
Think of a traffic light:
You:
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
Expected payoff −5/8 for both inthe last equilibrium.
Correlated equilibrium (Idea). Let
probability distribution
q : X → [0, 1]
be given. This q can be seen as acoordinating device.
Think of a traffic light:
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
Correlated equilibrium (Intuition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Chicken gameOther:
You: Straight Swerve
Straight (−10,−10) (5, 0)Swerve (0, 5) (−1,−1)
Three Nash equilibria:
■ ((1, 0), (0, 1))
■ ((0, 1), (1, 0))
■ ((3/8, 5/8), (3/8, 5/8))
Expected payoff −5/8 for both inthe last equilibrium.
Correlated equilibrium (Idea). Let
probability distribution
q : X → [0, 1]
be given. This q can be seen as acoordinating device.
Think of a traffic light:
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
Each time, the system is in one ofthese four states.
Correlated equilibrium (Definition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
orrelated equilibrium
Correlated equilibrium (Definition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
■ With marginal probability, q, the system is in each of these four statesx ∈ X.
orrelated equilibrium
Correlated equilibrium (Definition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
■ With marginal probability, q, the system is in each of these four statesx ∈ X.
■ Players know q.
orrelated equilibrium
Correlated equilibrium (Definition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
■ With marginal probability, q, the system is in each of these four statesx ∈ X.
■ Players know q.
■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.
orrelated equilibrium
Correlated equilibrium (Definition)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
■ With marginal probability, q, the system is in each of these four statesx ∈ X.
■ Players know q.
■ At each realisation of q, every party i comes to know only itscoordinate, xi, of the system state x.
Definition. A distribution q ∈ ∆(X) is called a orrelated equilibrium ifno party has ever an incentive to deviate from its own coordinate xi,assuming that others do not deviate from x−i as well.
Correlated equilibrium
(Formula)
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given.
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q.
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q.
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
For all i, xi and x′i :
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
For all i, xi and x′i :
∑x−i
q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i
q(x−i|xi)ui(xi, x−i).
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
For all i, xi and x′i :
∑x−i
q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i
q(x−i|xi)ui(xi, x−i).
Multiplying by q(xi) gives, for all i, xi and x′i :
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
For all i, xi and x′i :
∑x−i
q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i
q(x−i|xi)ui(xi, x−i).
Multiplying by q(xi) gives, for all i, xi and x′i :
∑x−i
q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i
q(xi, x−i)ui(xi, x−i).
Correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Idea:
Suppose q ∈ ∆(X) is given. Suppose everyone knows q. Let x be arealisation of q. Inform every i about xi, but not about x−i.
Now, in a CE, no one wants to change:
For all i, xi and x′i :
∑x−i
q(x−i|xi)ui(x′i , x−i) ≤ ∑x−i
q(x−i|xi)ui(xi, x−i).
Multiplying by q(xi) gives, for all i, xi and x′i :
∑x−i
q(xi, x−i)ui(x′i , x−i) ≤ ∑x−i
q(xi, x−i)ui(xi, x−i).
The latter is often used as the formula to verify a CE.
How to verify a
correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
You:
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?
Green :0
0.55(−10) +
0.55
0.555 = 5
Red :0
0.550 +
0.55
0.55(−1) = −1
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?
Green :0
0.55(−10) +
0.55
0.555 = 5
Red :0
0.550 +
0.55
0.55(−1) = −1
■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?
Green :0
0.55(−10) +
0.55
0.555 = 5
Red :0
0.550 +
0.55
0.55(−1) = −1
■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?
Red :0.40
0.450 +
0.05
0.45(−1) = −0.11
Green :0.40
0.45(−10) +
0.05
0.455 = −8.35
To verify a correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We will show that
q =
Other:
You: Green Red
Green 0.00 0.55Red 0.40 0.05
is a correlated equilibrium of
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
■ Suppose Player 1 sees Green.Would it be better for him to actas if he sees Red?
Green :0
0.55(−10) +
0.55
0.555 = 5
Red :0
0.550 +
0.55
0.55(−1) = −1
■ Suppose Player 1 sees Red.Would it be better for him to actas if he sees Green?
Red :0.40
0.450 +
0.05
0.45(−1) = −0.11
Green :0.40
0.45(−10) +
0.05
0.455 = −8.35
■ (5 + (−0.11))/2 = 2.45 >
payoffs from two out of threeNE.
The problem to find all
correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
You:
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
■ α + β + γ + δ = 1
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
■ α + β + γ + δ = 1
But also:
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
■ α + β + γ + δ = 1
But also:
■ u1(act like G | signal G) ≥u1(act like R | signal G).
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
■ α + β + γ + δ = 1
But also:
■ u1(act like G | signal G) ≥u1(act like R | signal G).
■ u1(act like R | signal R) ≥u1(act like G | signal R).
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Problem: find all correlatedequilibria for
Other:
You: Green Red
Green (−10,−10) (5, 0)Red (0, 5) (−1,−1)
Solution: set
q =
Other:
You: Green Red
Green α β
Red γ δ
Of course, first:
■ α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0
■ α + β + γ + δ = 1
But also:
■ u1(act like G | signal G) ≥u1(act like R | signal G).
■ u1(act like R | signal R) ≥u1(act like G | signal R).
■ Similarly for u2 (the columnplayer).
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G)
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R)
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
0γ +−1δ
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
0γ +−1δ ≥ −10γ + 5δ
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
0γ +−1δ ≥ −10γ + 5δ
5γ − 3δ
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
0γ +−1δ ≥ −10γ + 5δ
5γ − 3δ ≥ 0.
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
u1(act like G | signal G) ≥ u1(act like R | signal G)
α
α + β(−10) +
β
α + β5 ≥
α
α + β0 +
β
α + β− 1
−10α + 5β ≥ 0α +−1β
−5α + 3β ≥ 0.
Further,
u1(act like R | signal R) ≥ u1(act like G | signal R)
γ
γ + δ0 +
δ
γ + δ− 1 ≥
γ
γ + δ(−10) +
δ
γ + δ5
0γ +−1δ ≥ −10γ + 5δ
5γ − 3δ ≥ 0.
Similarly for u2 (the column player).
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We end up with:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We end up with:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
This is a solid convex polyhedron in R3:
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We end up with:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
This is a solid convex polyhedron in R3:
⇔
α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0
5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0
5β − 3(1 − α − β − γ)≥ 0
Find all correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
We end up with:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
This is a solid convex polyhedron in R3:
⇔
α ≥ 0, β ≥ 0, γ ≥ 0α + β + γ≤ 1−5α + 3β≥ 0
5γ − 3(1 − α − β − γ)≥ 0−5α + 3γ≥ 0
5β − 3(1 − α − β − γ)≥ 0
⇔
α ≥ 0, β ≥ 0, γ≥ 0α + β + γ≤ 1−5α + 3β≥ 0
3α + 3β + 8γ≥ 3−5α + 3γ≥ 0
3α + 8β + 3γ≥ 3.
Correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Admissible values for α, β and γ in the traffic light problem:
0.00.1
0.2
0.0
0.5
1.0
0.0
0.5
1.0
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?
Maximize: δ
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?
Maximize: δ
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?
Maximize: δ
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
(α, β, γ, δ) =
(
0,3
11,
3
11,
5
11
)
.
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
What is the longest proportion of time both traffic lights can be redsimultaneously before drivers start to ignore them?
Maximize: δ
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
(α, β, γ, δ) =
(
0,3
11,
3
11,
5
11
)
.
Answer: at most 5/11 = 45% of the time.
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:(α, β, γ, δ) = (0, 0, 1, 0) .
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:(α, β, γ, δ) = (0, 0, 1, 0) .
Answer: yes
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, γ ≥ 0, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:(α, β, γ, δ) = (0, 0, 1, 0) .
Answer: yes, but the column driver then has to be given free way all ofthe time.
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
(α, β, γ, δ) =
(
9
98,
15
98,
1
2,
25
98
)
.
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
(α, β, γ, δ) =
(
9
98,
15
98,
1
2,
25
98
)
.
Answer: no.
Find specific correlated equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Is it possible to let the row driver wait all the time while letting thecolumn driver pass no more than 50% of the time, withoutcompromising a correlated equilibrium?
Minimize: β
Subject to:
α ≥ 0, β ≥ 0, 0 ≤ γ ≤ 1/2, δ ≥ 0 5γ − 3δ≥ 0α + β + γ + δ= 1 −5α + 3γ≥ 0
−5α + 3β≥ 0 5β − 3δ≥ 0
Gives:
(α, β, γ, δ) =
(
9
98,
15
98,
1
2,
25
98
)
.
Answer: no. To maintain an equilibrium, the row driver has to give way15/98 ≈ 15% of the time.
Hierarchy of equilibria
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
Nash equilibrium ⇒ correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
If strategies are independent, we have
s−i(x−i|xi) = s−i(x−i)
Nash equilibrium ⇒ correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
If strategies are independent, we have
s−i(x−i|xi) = s−i(x−i)
Immediately,
for all i and x′i : ∑x−i
s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)
⇒ for all xi, i and x′i : ∑x−i
s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)
The latter is the conditional formulation of a correlated equilibrium.
Nash equilibrium ⇒ correlated equilibrium
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
If strategies are independent, we have
s−i(x−i|xi) = s−i(x−i)
Immediately,
for all i and x′i : ∑x−i
s−i(x−i)ui(x′i , x−i) ≤ ui(s) (Nash)
⇒ for all xi, i and x′i : ∑x−i
s(x−i|xi)ui(x′i , x−i) ≤ ui(s) (CE)
The latter is the conditional formulation of a correlated equilibrium.Therefore, every Nash equilibrium is a correlated equilibrium.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.
■ The LHS of the CCE is the xi-sum over all LHS’s of the CE.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.
■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.
■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.
■ We already derived NE ⇒ CE.
Summary
Author: Gerard Vreeswijk. Slides last modified on February 18th , 2015 at 14:40 Multi-agent learning: Equilibria
NE: for all i and x′i : ∑x−is−i(x−i) ui(x′i , x−i)≤ ui(s)
CE: for all xi, i and x′i : ∑x−iq(xi, x−i) ui(x′i , x−i)≤ ∑x−i
q(xi, x−i)ui(xi, x−i)
CCE: for all i and x′i : ∑x−iq−i(x−i) ui(x′i , x−i)≤ ui(q)
■ With CE and CCE there are no individual strategies.
■ CCE ⇒ exact conditions for empirical distribution of action profilesin no-regret matching!
■ The formulas for Nash and CCE are identical. But for Nash the s isthe product of its marginalised strategies. Therefore, NE ⇒ CCE.
■ The LHS of the CCE is the xi-sum over all LHS’s of the CE. Therefore,CE ⇒ CCE.
■ We already derived NE ⇒ CE. Therefore, NE ⇒ CE ⇒ CCE.