belief learning in an unstable infinite game
DESCRIPTION
Belief Learning in an Unstable Infinite Game. Paul J. Healy CMU. Issue #3. Issue #2. Belief Learning in an Unstable Infinite Game. Issue #1. Issue #1: Infinite Games. Typical Learning Model: Finite set of strategies Strategies get weight based on ‘fitness’ - PowerPoint PPT PresentationTRANSCRIPT
Belief Learning in an Unstable Infinite Game
Paul J. Healy
CMU
Belief Learning in an Unstable Infinite Game
Issue #3
Issue #1
Issue #2
Issue #1: Infinite Games
• Typical Learning Model:– Finite set of strategies– Strategies get weight based on ‘fitness’– Bells & Whistles: experimentation, spillovers…
• Many important games have infinite strategies– Duopoly, PG, bargaining, auctions, war of attrition…
• Quality of fit sensitive to grid size?• Models don’t use strategy space structure
Previous Work
• Grid size on fit quality:– Arifovic & Ledyard
• Groves-Ledyard mechanisms• Convergence failure of RL with |S| = 51
• Strategy space structure:– Roth & Erev AER ’99
• Quality-of-fit/error measures– What’s the right metric space?
• Closeness in probs. or closeness in strategies?
Issue #2: Unstable Game
• Usually predicting convergence rates– Example: p–beauty contests
• Instability: – Toughest test for learning models– Most statistical power
Previous Work
• Chen & Tang ‘98– Walker mechanism & unstable Groves-Ledyard– Reinforcement > Fictitious Play > Equilibrium
• Healy ’06– 5 PG mechanisms, predicting convergence or not
• Feltovich ’00– Unstable finite Bayesian game– Fit varies by game, error measure
Issue #3: Belief Learning
• If subjects are forming beliefs, measure them!
• Method 1: Direct elicitation– Incentivized guesses about s-i
• Method 2: Inferred from payoff table usage– Tracking payoff ‘lookups’ may inform our models
Previous Work
• Nyarko & Schotter ‘02– Subjects BR to stated beliefs– Stated beliefs not too accurate
• Costa-Gomes, Crawford & Boseta ’01– Mouselab to identify types– How players solve games, not learning
This Paper
• Pick an unstable infinite game• Give subjects a calculator tool & track usage• Elicit beliefs in some sessions
• Fit models to data in standard way• Study formation of “beliefs”
– “Beliefs” <= calculator tool– “Beliefs” <= elicited beliefs
The Game
• Walker’s PG mechanism for 3 players• Added a ‘punishment’ parameter
N 1, 2, 3Si 10, 10 R1
u isi , s i viys tisys
jsj
viy b i y a i y2
tis si 1 si 1ys
Parameters & Equilibrium
• vi(y) = biy – aiy2 + ci
• Pareto optimum: y = 7.5• Unique PSNE: si* = 2.5
• Punishment γ = 0.1• Purpose: Not too wild, payoffs rarely negative
• Guessing Payoff: 10 – |gL - sL|/4 - |gR - sR|/4• Game Payoffs: Pr(<50) = 8.9%
Pr(>100) = 71%
ai bi ci
1 0.1 1.5 110
2 0.2 3.0 125
3 0.3 4.5 140
Choice of Grid Size
Grid Width 5 2 1 1/2 1/4 1/8
# Grid Points 5 11 21 41 81 161
% on Grid 59.7 61.6 88.7 91.6 91.9 91.9
S = [-10,10]
Properties of the Game
• Best response:
• BR Dynamics: unstable– One eigenvalue is +2
sBR b1/2a1b2/2a2b3/2a3
0 1 /2a1 1 /2a1
1 /2a2 0 1 /2a21 /2a3 1 /2a3 0
s
Interface
Design• PEEL Lab, U. Pittsburgh• All Sessions
– 3 player groups, 50 periods– Same group, ID#s for all periods– Payoffs etc. common information– No explicit public good framing– Calculator always available– 5 minute ‘warm-up’ with calculator
• Sessions 1-6– Guess sL and sR.
• Sessions 7-13– Baseline: no guesses.
• Total Variation:– No significant difference (p=0.745)
• No. of Strategy Switches:– No significant difference (p=0.405)
• Autocorrelation (predictability):– Slightly more without elicitation
• Total Earnings per Session:– No significant difference (p=1)
• Missed Periods:– Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%)
Does Elicitation Affect Choice?
t|xt xt 1 |
Does Play Converge?
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
12
14
16
18
20Average Distance From Equilibrium
Average | si – si* | per Period Average | y – yo | per Period
0 5 10 15 20 25 30 35 40 45 500
1
2
3
4
5
6
7
8
9
10Average |y - yo|
Does Play Converge, Part 2
0 5 10 15 20 25 30 35 40 45 50-10
-8
-6
-4
-2
0
2
4
6
8
10
Accuracy of Beliefs• Guesses get better in time
0 5 10 15 20 25 30 35 40 450
2
4
6
8
10
12
14
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
12
14
Average || s-i – s-i(t) || per Period
Elicited guesses Calculator inputs
Model 1: Parametric EWA
• δ : weight on strategy actually played• φ : decay rate of past attractions• ρ : decay rate of past experience• A(0): initial attractions• N(0): initial experience• λ : response sensitivity to attractions
Atsi At 1si N t 1 1 Is i ,s it u isi , s it
N tN t N t 1 1
itsi e A ts i
xS ie A tx
Model 1’: Self-Tuning EWA
• N(0) = 1• Replace δ and φ with deterministic functions:
tsi 1 if u isi , s it u ist0 otherwise
i,t 1 12 xS i
1t
1
t
Ix,s i Ix,s it
2
Atsi Nt 1A t 1s i 1 Isi ,sit u is i ,s it
Nt 1 1
STEWA: Setup
• Only remaining parameters: λ and A0
– λ will be estimated
– 5 minutes of ‘Calculator Time’ gives A0
• Average payoff from calculator trials:
A0si
t 1T
I si ,si tu ist
t 1T
I si ,si t
if t 1T Is i ,s it 1
1T
t 1T u ist otherwise
STEWA: Fit
• Likelihoods are ‘zero’ for all λ– Guess: Lots of near misses in predictions
• Alternative Measure: Quad. Scoring Rule
– Best fit: λ = 0.04 (previous studies: λ>4)– Suggests attractions are very concentrated
1 kP isik , t Isik , sit
2
-10
-8
-6
-4
-2
0
2
4
6
8
10
1112131415161718192021222324252627282930
-
0.2000
0.4000
0.6000
0.8000
1.0000
EWA Prob
Strategy
Period
STEWA Lambda=4: Session 3 Player 2 Pers 11-30
-10
-8
-6
-4
-2
0
2
4
6
8
10
1112131415161718192021222324252627282930
0
0.2
0.4
0.6
0.8
1
EWA Prob
Strategy
Period
STEWA Lambda=0.04: Session 3 Player 2 Pers 11-30
STEWA: Adjustment Attempts
• The problem: near misses in strategy space,
not in time• Suggests: alter δ (weight on hypotheticals)
– original specification : QSR* = 1.193 @ λ*=0.04– δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03– δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175– δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06
• Altering φ:– 1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04
STEWA: Other Modifications
• Equal initial attractions: worse• Smoothing
– Takes advantage of strategy space structure• λ spreads probability across strategies evenly• Smoothing spreads probability to nearby strategies
– Smoothed Attractions– Smoothed Probabilities– But… No Improvement in QSR* or λ* !
• Tentative Conclusion:– STEWA: not broken, or can’t be fixed…
Other Standard Models
• Nash Equilibrium• Uniform Mixed Strategy (‘Random’)• Logistic Cournot BR• Deterministic Cournot BR• Logistic Fictitious Play• Deterministic Fictitious Play• k-Period BR
st BR 1k t k
t 1 s
“New” Models
• Best respond to stated beliefs (S1-S6 only)
• Best respond to calculator entries– Issue: how to aggregate calculator usage?– Decaying average of input
• Reinforcement based on calculator payoffs– Decaying average of payoffs
Model ComparisonsMODEL PARAM BIC 2-QSR MAD MSD
Random Choice* N/A In: Infinite In: 0.952
Out: 0.878
In: 7.439
Out: 7.816
In: 82.866
Out: 85.558
Logistic STEWA* λ In: Infinite In: 0.807
Out: 0.665
λ*=0.04
In: 3.818
Out: 3.180
λ*=0.41
In: 34.172
Out: 22.853
λ*=0.35
Logistic Cournot* λ In: Infinite In: 0.952
Out: 0.878
λ*=0.00(!)
In: 4.222
Out: 3.557
λ*=4.30
In: 38.186
Out: 25.478
λ*=4.30
Logistic F.P.* λ In: Infinite In: 0.955
Out: 0.878
λ*=14.98
In: 4.265
Out: 3.891
λ*=4.47
In: 31.062
Out: 22.133
λ*=4.47
* Estimates on the grid of integers {-10,-9,…,9,10}
In = periods 1-35 Out = periods 36-End
Model Comparisons 2MODEL PARAM MAD MSD
BR(Guesses)
(6 sessions only)
N/A In: 5.5924
Out: 3.3693
In: 57.874 Out: 19.902
BR(Calculator Input) δ (=1/2) In: 6.394
Out: 8.263
In: 79.29
Out: 116.7
Calculator Reinforcement*
δ (=1/2) In: 7.389
Out: 7.815
In: 82.407
Out: 85.495
k-Period BR k In: 4.2126
Out: 3.582
k* = 4
In: 35.185
Out: 23.455
k* = 4
Cournot N/A In: 4.7974
Out: 3.857
In: 45.283
Out: 29.058
Weighted F.P. δ In: 4.500
Out: 3.518
δ* = 0.56
In: 38.290
Out: 22.426
δ * = 0.65
The “Take-Homes”
• Methodological issues– Infinite strategy space– Convergence vs. Instability– Right notion of error
• Self-Tuning EWA fits best.
• Guesses & calculator input don’t seem to offer any more predictive power… ?!?!