performance and prediction: bayesian modelling of fallible choice in chess guy haworth
DESCRIPTION
Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth [email protected]. Topics …. Motivation Reference Fallible Players E(c) In the zone of Endgame Table Zone (ETZ) Prior to the Endgame Table Zone - PowerPoint PPT PresentationTRANSCRIPT
ACG12 Performance and Prediction, 2009-05-111
Performance and Prediction:
Bayesian Modelling
of Fallible Choice in ChessGuy Haworth
ACG12 Performance and Prediction, 2009-05-112
Topics ….
Motivation
Reference Fallible Players E(c)
In the zone of Endgame Table Zone (ETZ)
Prior to the Endgame Table Zone
A set of hypotheses {Hk} about engines {E(ck)}
Bayesian Inference, given a choice of hypotheses, and evidence:
Prior belief, posterior belief, Prob [Hk]
Translating the 'Reference Player' idea to the pre-EZT
Results … differentiation, value of small samples,
Motivation
Assess decision makers when they are under pressure Need a Utopian Decision Maker, a Reference Agent (RA)
Finite set of choices, each with some Utility Value A 'model world' is used to define the Utility Value RA always makes the choice with the best Utility Value
RA is then deskilled to make Reference Fallible Agents (RFAs) RFA does not always make the best choice
{RFA} the Space of Reference Fallible Agents (SRFA)
Now we take a human decision maker H… and associate them with some profile in SRFA
… by hypothesising that they are one of the RFAs and
… weighing the evidence to decide how likely each RFA actually is
ACG12 Performance and Prediction, 2009-05-113
1.Kc2, Kc1 or Ka1?
ACG12 Performance and Prediction, 2009-05-114
Mate in d = 23 with 1. Kc2Mate in d = 24 with 1. Kc1Mate in d = 29 with 1. Ka1
A Chess Engine E chooses 1. Kc2A stochastic version of E may not
Let E(c) be a stochastic engine:Likelihood[E(c) moves to p, depth d]
= (1 + d)-c Prob[E(c), p]
c = 0: all moves equally likelyc = : only best moves played
(d = #23) Kc2, (24) Kc1, (#29) Ka1
ACG12 Performance and Prediction, 2009-05-115
c = 5 c = 20
Kc2 Kc2
Kc1
Kc1
c = 0
Kc2
Kc1
Ka1
Ka1
Ka1
Which Engine … is playing the moves?
Suppose you see a sequence of player P's moves in KQKR You are told that they are being played by some engine E(c) You are told that it is one of E(0), E(5) or E(20) Which agent, A, is it: E(0), E(5) or E(20)? What would be fair odds? If you 'know nothing' (as you do) at the beginning …
Prob[A = E(i)] = 1/3 Let's suppose you see a sequence of optimal moves You should start to lean away from E(0) and towards E(20) But what are the probabilities now? No need to guess …
Bayes' Rule tells you exactly what the new probabilities are
ACG12 Performance and Prediction, 2009-05-116
Bayes' Rule
Probability [Hypothesis | Evidence] Prob [Hypothesis] Prob [Evidence | Hypothesis]
We have a choice of three hypotheses:
H0 "E = E(0)", H5 "E = E(5)", H20 "E = E(20)"
Prob[Hi] = 1/3 = 0.33 = the prior probability, i.e. before Kc2 is seen
Prob[E(0), Kc2] = 1/3 = 0.33 Prob[E(5), Kc2] = 0.47; Prob[E(20), Kc2] = 0.70
Prob[H0 | Kc2] 0.33 0.33 = 0.11 … etc (0.16, 0.23 … sum 0.50)
Scaling … Prob[H0 | Kc2] = 0.22 = the posterior probability
Prob[H5 | Kc2] = 0.31 and Prob[H20 | Kc2] = 0.47
ACG12 Performance and Prediction, 2009-05-117
The effect of Prior Probabilities
In the example above, the posterior probability of H = Prob[Ev | H]
This is because the prior probability of H was 1/3 for all H So the application of Bayes' Rule has been somewhat obscured
Suppose the priors were H0 0.2, H5 0.3, H20 0.5
Then the posterior probabilities are proportional to: H0 : 0.2 0.33 = 0.066
H5 : 0.3 0.47 = 0.141
H20 : 0.5 0.70 = 0.350 … totalling 0.557 so we scale up to …
Prob[H0] = 0.066/0.557 = 0.12
Prob[H5] = 0.141/0.557 = 0.25; Prob[H5] = 0.350/0.557 = 0.63
So, new posteriors were 0.22/0.31/0.47 … now 0.12/0.25/0.63
ACG12 Performance and Prediction, 2009-05-118
Rev. Bayes, Transform, Aeolian Harp
ACG12 Performance and Prediction, 2009-05-119
c2
c1
Bayesian Inference
PA: P A0
PA: P A
Refine model parameters 'Model Error'║EPP – EPA║
c2
c1
"Let the Wind of Evidence blow through the Aeolian Harp of your Hypotheses"
Chess Engines as Benchmarks?
ACG12 Performance and Prediction, 2009-05-1110
Engines are improving all the time: hw, algorithms, knowledge There is actually a danger that they may become too good
They are not infallible: 'best moves' are not necessarily best q.v. changes of mind from one search depth to the next However, greater depth of search better engine [Beal] Benchmark fallibility contributes statistical uncertainty to findings
Independence is also required: engine E cannot vote itself 'best'!
Using the idea on pre-EGT Chess
Idea is to use chess-engine evaluations {vi} rather than depths Announced in 'Chess Endgame News', ICGA J. 28-4, 243 (2005)
However, this brings some complications: Some evaluations, unlike Depths to Mate, are negative The evaluations vi are evaluated using heuristics
Chess-engines' preferences are not infallible Engines' preferences may vary engine-to-engine, depth-to-depth
Some intuitive observations: A panel of engines is better than one engine as a benchmark The better the engine and the greater the depth [Beal], the better Uncertainty is halved by using four times the data
ACG12 Performance and Prediction, 2009-05-1111
Performance v Skill Rating
ACG12 Performance and Prediction, 2009-05-1112
Player move 1 move 2 … move n1 outcome
White e2-e4 … … Qf5# 1
Black e7-e5 … … - 0
Player move 1 move 2 … move n2 outcome
White d2-d4 … … g2-g3 0
Black f7-f5 … … Bb5# 1
Player move 1 move 2 … move nk outcome
White e2-e4 … … c7-c8 Qc8
1
Black e7-e6 … … - 0
…
performance rating (Elo)skill rating vs.
Player Elo
Kasparov
2851
Karpov 2795
… …
Stochastic choice, given position evaluations
At position p, some move mi to positions pi have evals vi
Can we say Likelihood[E(c), mi] = (1 + vi)c
No, because some vi may be negative
Need a mapping v w, s.t. i, wi 0 and v1 > v2 w1 < w2
Some functions w = C(v) are better than others! The intuitively obvious wi = 1 + |v1| + (v1 – vi) is not ideal
The wi are analogous to the di taken from an Endgame Table
Currently using wi = + (v1 – vi) with >0 … in fact = 0.1
Model choices, yet to be tested as to effect Choice of specific engine and search-depth Choice of Mapping to, e.g., r1.E() + r2.E(c) rather than to one engine E(c)
ACG12 Performance and Prediction, 2009-05-1113
Results … based on TOGA II (depth 10)
Measured: Performance against Kaissa rather than opponent' An absolute rather than relative measure, given the benchmark A measure not affected by the opponent's performance m data points per game, rather than one (the game result)
Spectroscope! Virtual players at different ELO can be differentiated Higher ELO higher apparent competence c Winners visibly relax and play for the result when closing out Best performance-indicators are drawn games between like players
Performance, pre- and post-ELO, assessed in same terms Epochs of performance compared, pre- and post- cheating accusations Games tracked in 2D not 1D (net advantage); games compared Absolute performance of ELO 2400 players tracked across time
ACG12 Performance and Prediction, 2009-05-1114
Virtual ELO Players: Data
ACG12 Performance and Prediction, 2009-05-1115
# Player Elomin Elomax Period Games Pos. c min c maxμ
cσ
c σ c * Pos½
1 Elo_2100 2090 2110 1994-1998 217 12,751 1.04 1.10 1.0660 .00997 1.126
2 Elo_2200 2190 2210 1971-1998 569 29,611 1.11 1.15 1.1285 .00678 1.167
3 Elo_2300 2290 2310 1971-2005 568 30,070 1.14 1.18 1.1605 .00694 1.2034 Elo_2400 2390 2410 1971-2006 603 31,077 1.21 1.25 1.2277 .00711 1.253
5 Elo_2500 2490 2510 1995-2006 636 30,168 1.25 1.29 1.2722 .00747 1.297
6 Elo_2600 2590 2610 1995-2006 615 30,084 1.27 1.33 1.2971 .00770 1.3367 Elo_2700 2690 2710 1991-2006 225 13,796 1.29 1.35 1.3233 .01142 1.341
Profile of Virtual ELO Players in c-space
ACG12 Performance and Prediction, 2009-05-1116
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4
Pro
b
c
Prob[c] for different Elo ranges
E2100
E2200
E2300
E2400
E2500
E2600
E2700
2100
2300 24002600
2700
Keres -v- The Rest (1948); D.P.Singh (2006)
WCC (1948): Keres 0 Botvinnik 4
ACG12 Performance and Prediction, 2009-05-1117
D.P.Singh v Opponents
D.P.Singh – 'before' and 'after'
ACG12 Performance and Prediction, 2009-05-1118
Two 6-month periods
Not as conclusive as it appears
'c'-tracking across the whole period
Allwermann-Kalinitschew (1998)
Variation in c for both players;
Track locus of game in 2D
ACG12 Performance and Prediction, 2009-05-1119
Standard 1-dimensional charting of the game
Summary
'Contextual Analysis' (CA) of the individual player's decisions 'Decision Matching' (DM) uses less information and is cruder 'Average Differencing' (AD) uses less information: ditto
CA successfully differentiates players of different ELOs the standard deviation on c gives an idea of differentiator-power expect CA to be a better differentiator than AD … and expect AD to be better than DM
CA, using Bayesian Analysis, applied to: Career and epoch analysis, tournament and game analysis
Future directions: Evolving the method, including deeper statistical treatment Applying it to other chess and non-chess scenarios
ACG12 Performance and Prediction, 2009-05-1120
Spare
ACG12 Performance and Prediction, 2009-05-1121