performance and prediction: bayesian modelling of fallible choice in chess guy haworth

21
ACG12 Performance and Prediction, 2009-05-11 1 Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth [email protected]

Upload: breck

Post on 11-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth [email protected]. Topics …. Motivation Reference Fallible Players E(c) In the zone of Endgame Table Zone (ETZ) Prior to the Endgame Table Zone - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

ACG12 Performance and Prediction, 2009-05-111

Performance and Prediction:

Bayesian Modelling

of Fallible Choice in ChessGuy Haworth

[email protected]

Page 2: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

ACG12 Performance and Prediction, 2009-05-112

Topics ….

Motivation

Reference Fallible Players E(c)

In the zone of Endgame Table Zone (ETZ)

Prior to the Endgame Table Zone

A set of hypotheses {Hk} about engines {E(ck)}

Bayesian Inference, given a choice of hypotheses, and evidence:

Prior belief, posterior belief, Prob [Hk]

Translating the 'Reference Player' idea to the pre-EZT

Results … differentiation, value of small samples,

Page 3: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Motivation

Assess decision makers when they are under pressure Need a Utopian Decision Maker, a Reference Agent (RA)

Finite set of choices, each with some Utility Value A 'model world' is used to define the Utility Value RA always makes the choice with the best Utility Value

RA is then deskilled to make Reference Fallible Agents (RFAs) RFA does not always make the best choice

{RFA} the Space of Reference Fallible Agents (SRFA)

Now we take a human decision maker H… and associate them with some profile in SRFA

… by hypothesising that they are one of the RFAs and

… weighing the evidence to decide how likely each RFA actually is

ACG12 Performance and Prediction, 2009-05-113

Page 4: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

1.Kc2, Kc1 or Ka1?

ACG12 Performance and Prediction, 2009-05-114

Mate in d = 23 with 1. Kc2Mate in d = 24 with 1. Kc1Mate in d = 29 with 1. Ka1

A Chess Engine E chooses 1. Kc2A stochastic version of E may not

Let E(c) be a stochastic engine:Likelihood[E(c) moves to p, depth d]

= (1 + d)-c Prob[E(c), p]

c = 0: all moves equally likelyc = : only best moves played

Page 5: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

(d = #23) Kc2, (24) Kc1, (#29) Ka1

ACG12 Performance and Prediction, 2009-05-115

c = 5 c = 20

Kc2 Kc2

Kc1

Kc1

c = 0

Kc2

Kc1

Ka1

Ka1

Ka1

Page 6: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Which Engine … is playing the moves?

Suppose you see a sequence of player P's moves in KQKR You are told that they are being played by some engine E(c) You are told that it is one of E(0), E(5) or E(20) Which agent, A, is it: E(0), E(5) or E(20)? What would be fair odds? If you 'know nothing' (as you do) at the beginning …

Prob[A = E(i)] = 1/3 Let's suppose you see a sequence of optimal moves You should start to lean away from E(0) and towards E(20) But what are the probabilities now? No need to guess …

Bayes' Rule tells you exactly what the new probabilities are

ACG12 Performance and Prediction, 2009-05-116

Page 7: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Bayes' Rule

Probability [Hypothesis | Evidence] Prob [Hypothesis] Prob [Evidence | Hypothesis]

We have a choice of three hypotheses:

H0 "E = E(0)", H5 "E = E(5)", H20 "E = E(20)"

Prob[Hi] = 1/3 = 0.33 = the prior probability, i.e. before Kc2 is seen

Prob[E(0), Kc2] = 1/3 = 0.33 Prob[E(5), Kc2] = 0.47; Prob[E(20), Kc2] = 0.70

Prob[H0 | Kc2] 0.33 0.33 = 0.11 … etc (0.16, 0.23 … sum 0.50)

Scaling … Prob[H0 | Kc2] = 0.22 = the posterior probability

Prob[H5 | Kc2] = 0.31 and Prob[H20 | Kc2] = 0.47

ACG12 Performance and Prediction, 2009-05-117

Page 8: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

The effect of Prior Probabilities

In the example above, the posterior probability of H = Prob[Ev | H]

This is because the prior probability of H was 1/3 for all H So the application of Bayes' Rule has been somewhat obscured

Suppose the priors were H0 0.2, H5 0.3, H20 0.5

Then the posterior probabilities are proportional to: H0 : 0.2 0.33 = 0.066

H5 : 0.3 0.47 = 0.141

H20 : 0.5 0.70 = 0.350 … totalling 0.557 so we scale up to …

Prob[H0] = 0.066/0.557 = 0.12

Prob[H5] = 0.141/0.557 = 0.25; Prob[H5] = 0.350/0.557 = 0.63

So, new posteriors were 0.22/0.31/0.47 … now 0.12/0.25/0.63

ACG12 Performance and Prediction, 2009-05-118

Page 9: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Rev. Bayes, Transform, Aeolian Harp

ACG12 Performance and Prediction, 2009-05-119

c2

c1

Bayesian Inference

PA: P A0

PA: P A

Refine model parameters 'Model Error'║EPP – EPA║

c2

c1

"Let the Wind of Evidence blow through the Aeolian Harp of your Hypotheses"

Page 10: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Chess Engines as Benchmarks?

ACG12 Performance and Prediction, 2009-05-1110

Engines are improving all the time: hw, algorithms, knowledge There is actually a danger that they may become too good

They are not infallible: 'best moves' are not necessarily best q.v. changes of mind from one search depth to the next However, greater depth of search better engine [Beal] Benchmark fallibility contributes statistical uncertainty to findings

Independence is also required: engine E cannot vote itself 'best'!

Page 11: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Using the idea on pre-EGT Chess

Idea is to use chess-engine evaluations {vi} rather than depths Announced in 'Chess Endgame News', ICGA J. 28-4, 243 (2005)

However, this brings some complications: Some evaluations, unlike Depths to Mate, are negative The evaluations vi are evaluated using heuristics

Chess-engines' preferences are not infallible Engines' preferences may vary engine-to-engine, depth-to-depth

Some intuitive observations: A panel of engines is better than one engine as a benchmark The better the engine and the greater the depth [Beal], the better Uncertainty is halved by using four times the data

ACG12 Performance and Prediction, 2009-05-1111

Page 12: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Performance v Skill Rating

ACG12 Performance and Prediction, 2009-05-1112

Player move 1 move 2 … move n1 outcome

White e2-e4 … … Qf5# 1

Black e7-e5 … … - 0

Player move 1 move 2 … move n2 outcome

White d2-d4 … … g2-g3 0

Black f7-f5 … … Bb5# 1

Player move 1 move 2 … move nk outcome

White e2-e4 … … c7-c8 Qc8

1

Black e7-e6 … … - 0

performance rating (Elo)skill rating vs.

Player Elo

Kasparov

2851

Karpov 2795

… …

Page 13: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Stochastic choice, given position evaluations

At position p, some move mi to positions pi have evals vi

Can we say Likelihood[E(c), mi] = (1 + vi)c

No, because some vi may be negative

Need a mapping v w, s.t. i, wi 0 and v1 > v2 w1 < w2

Some functions w = C(v) are better than others! The intuitively obvious wi = 1 + |v1| + (v1 – vi) is not ideal

The wi are analogous to the di taken from an Endgame Table

Currently using wi = + (v1 – vi) with >0 … in fact = 0.1

Model choices, yet to be tested as to effect Choice of specific engine and search-depth Choice of Mapping to, e.g., r1.E() + r2.E(c) rather than to one engine E(c)

ACG12 Performance and Prediction, 2009-05-1113

Page 14: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Results … based on TOGA II (depth 10)

Measured: Performance against Kaissa rather than opponent' An absolute rather than relative measure, given the benchmark A measure not affected by the opponent's performance m data points per game, rather than one (the game result)

Spectroscope! Virtual players at different ELO can be differentiated Higher ELO higher apparent competence c Winners visibly relax and play for the result when closing out Best performance-indicators are drawn games between like players

Performance, pre- and post-ELO, assessed in same terms Epochs of performance compared, pre- and post- cheating accusations Games tracked in 2D not 1D (net advantage); games compared Absolute performance of ELO 2400 players tracked across time

ACG12 Performance and Prediction, 2009-05-1114

Page 15: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Virtual ELO Players: Data

ACG12 Performance and Prediction, 2009-05-1115

# Player Elomin Elomax Period Games Pos. c min c maxμ

c σ c * Pos½

1 Elo_2100 2090 2110 1994-1998 217 12,751 1.04 1.10 1.0660 .00997 1.126

2 Elo_2200 2190 2210 1971-1998 569 29,611 1.11 1.15 1.1285 .00678 1.167

3 Elo_2300 2290 2310 1971-2005 568 30,070 1.14 1.18 1.1605 .00694 1.2034 Elo_2400 2390 2410 1971-2006 603 31,077 1.21 1.25 1.2277 .00711 1.253

5 Elo_2500 2490 2510 1995-2006 636 30,168 1.25 1.29 1.2722 .00747 1.297

6 Elo_2600 2590 2610 1995-2006 615 30,084 1.27 1.33 1.2971 .00770 1.3367 Elo_2700 2690 2710 1991-2006 225 13,796 1.29 1.35 1.3233 .01142 1.341

Page 16: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Profile of Virtual ELO Players in c-space

ACG12 Performance and Prediction, 2009-05-1116

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4

Pro

b

c

Prob[c] for different Elo ranges

E2100

E2200

E2300

E2400

E2500

E2600

E2700

2100

2300 24002600

2700

Page 17: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Keres -v- The Rest (1948); D.P.Singh (2006)

WCC (1948): Keres 0 Botvinnik 4

ACG12 Performance and Prediction, 2009-05-1117

D.P.Singh v Opponents

Page 18: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

D.P.Singh – 'before' and 'after'

ACG12 Performance and Prediction, 2009-05-1118

Two 6-month periods

Not as conclusive as it appears

'c'-tracking across the whole period

Page 19: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Allwermann-Kalinitschew (1998)

Variation in c for both players;

Track locus of game in 2D

ACG12 Performance and Prediction, 2009-05-1119

Standard 1-dimensional charting of the game

Page 20: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Summary

'Contextual Analysis' (CA) of the individual player's decisions 'Decision Matching' (DM) uses less information and is cruder 'Average Differencing' (AD) uses less information: ditto

CA successfully differentiates players of different ELOs the standard deviation on c gives an idea of differentiator-power expect CA to be a better differentiator than AD … and expect AD to be better than DM

CA, using Bayesian Analysis, applied to: Career and epoch analysis, tournament and game analysis

Future directions: Evolving the method, including deeper statistical treatment Applying it to other chess and non-chess scenarios

ACG12 Performance and Prediction, 2009-05-1120

Page 21: Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Spare

ACG12 Performance and Prediction, 2009-05-1121