the aggregate complexity of decisions in the game of go

9
Eur. Phys. J. B 80, 555–563 (2011) DOI: 10.1140/epjb/e2011-10905-8 Regular Article T HE EUROPEAN P HYSICAL JOURNAL B The aggregate complexity of decisions in the game of Go M.S. Harr´ e 1, a , T. Bossomaier 2 , A. Gillett 1 , and A. Snyder 1 1 The Centre for the Mind, The University of Sydney, 2006 Sydney, Australia 2 Centre for Research in Complex Systems, Charles Sturt University, Australia Received 21 November 2010 / Received in final form 2 March 2011 Published online 24 March 2011 – c EDP Sciences, Societ`a Italiana di Fisica, Springer-Verlag 2011 Abstract. Artificial intelligence (AI) research is fast approaching, or perhaps has already reached, a bottle- neck whereby further advancement towards practical human-like reasoning in complex tasks needs further quantified input from large studies of human decision-making. Previous studies in psychology, for example, often rely on relatively small cohorts and very specific tasks. These studies have strongly influenced some of the core notions in AI research such as the reinforcement learning and the exploration versus exploitation paradigms. With the goal of contributing to this direction in AI developments we present our findings on the evolution towards world-class decision-making across large cohorts of subjects in the formidable game of Go. Some of these findings directly support previous work on how experts develop their skills but we also report on several previously unknown aspects of the development of expertise that suggests new avenues for AI research to explore. In particular, at the level of play that has so far eluded current AI systems for Go, we are able to quantify the lack of ‘predictability’ of experts and how this changes with their level of skill. 1 Introduction This work uses very large databases of professional and amateur players of the game of Go in order to understand the properties of the choices made by populations of play- ers of a known rank. We take a large and tactically well studied area of the Go board and empirically derive a com- plete game tree of every choice made by players according to their rank. Sorting our results according to this rank, from lowest to highest amateurs and then lowest to high- est professionals, provides a very fine grained data-set of changes in behavioural patterns across large populations as they acquire exceptionally high levels of skill in one of the most formidable popular games played today. The underlying principle of this work is to move the analysis of complex decision tasks away from the detailed local analysis of strongly interacting elements and further towards the domain of weakly interacting contextual ele- ments of a situation. Previous work on Go has successfully shown the utility of seeing the board in terms of the indi- vidual pieces (called stones) that have strong local inter- actions [1]. This technique was used to estimate territory and as a possible foundation on which a decision model could be based. A similar approach views the Go board as a ‘conditional random field’ [1,2], a technique that is able to relax the strong independence assumptions of hid- den markov models and stochastic grammars [3]. Other directions have considered local patterns of stones for de- a e-mail: [email protected] cision making [4,5], the representation of the board as a graph [6] and the formal analysis of endgame positions in terms of independent subgames [7]. This work is intended to inform the next generation of AI systems in regard to the complexity of decisions within the context of learn- ing better play through an understanding of the changing contextual dependency of decisions. This perspective im- plies that whilst we study populations of players and their choices what we have in mind is a single AI that is able to make choices that are consistent with players of a certain skill. The paper is laid out in the following manner. We first introduce the game of Go along with its basic principles and how we constructed the game trees of decisions from databases. Then the necessary tools of information theory are introduced and described. The game trees are then analysed in terms of information theory and our principal findings are presented. Finally we discuss the consequences of these findings in the context of other research. 2 The game of Go The game of Go is more than 2000 years old and still holds a significant cultural place in many asian countries. Despite a vast array of information and analysis that is available to players on almost every aspect of Go strategy, possibly even surpassing that of Chess, the rules of the game are deceptively simple. There are two players each of which plays with either black or white stones on a board

Upload: m-s-harre

Post on 06-Aug-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The aggregate complexity of decisions in the game of Go

Eur. Phys. J. B 80, 555–563 (2011)DOI: 10.1140/epjb/e2011-10905-8

Regular Article

THE EUROPEANPHYSICAL JOURNAL B

The aggregate complexity of decisions in the game of Go

M.S. Harre1,a, T. Bossomaier2, A. Gillett1, and A. Snyder1

1 The Centre for the Mind, The University of Sydney, 2006 Sydney, Australia2 Centre for Research in Complex Systems, Charles Sturt University, Australia

Received 21 November 2010 / Received in final form 2 March 2011Published online 24 March 2011 – c© EDP Sciences, Societa Italiana di Fisica, Springer-Verlag 2011

Abstract. Artificial intelligence (AI) research is fast approaching, or perhaps has already reached, a bottle-neck whereby further advancement towards practical human-like reasoning in complex tasks needs furtherquantified input from large studies of human decision-making. Previous studies in psychology, for example,often rely on relatively small cohorts and very specific tasks. These studies have strongly influenced some ofthe core notions in AI research such as the reinforcement learning and the exploration versus exploitationparadigms. With the goal of contributing to this direction in AI developments we present our findings onthe evolution towards world-class decision-making across large cohorts of subjects in the formidable gameof Go. Some of these findings directly support previous work on how experts develop their skills but we alsoreport on several previously unknown aspects of the development of expertise that suggests new avenuesfor AI research to explore. In particular, at the level of play that has so far eluded current AI systems forGo, we are able to quantify the lack of ‘predictability’ of experts and how this changes with their level ofskill.

1 Introduction

This work uses very large databases of professional andamateur players of the game of Go in order to understandthe properties of the choices made by populations of play-ers of a known rank. We take a large and tactically wellstudied area of the Go board and empirically derive a com-plete game tree of every choice made by players accordingto their rank. Sorting our results according to this rank,from lowest to highest amateurs and then lowest to high-est professionals, provides a very fine grained data-set ofchanges in behavioural patterns across large populationsas they acquire exceptionally high levels of skill in one ofthe most formidable popular games played today.

The underlying principle of this work is to move theanalysis of complex decision tasks away from the detailedlocal analysis of strongly interacting elements and furthertowards the domain of weakly interacting contextual ele-ments of a situation. Previous work on Go has successfullyshown the utility of seeing the board in terms of the indi-vidual pieces (called stones) that have strong local inter-actions [1]. This technique was used to estimate territoryand as a possible foundation on which a decision modelcould be based. A similar approach views the Go boardas a ‘conditional random field’ [1,2], a technique that isable to relax the strong independence assumptions of hid-den markov models and stochastic grammars [3]. Otherdirections have considered local patterns of stones for de-

a e-mail: [email protected]

cision making [4,5], the representation of the board as agraph [6] and the formal analysis of endgame positions interms of independent subgames [7]. This work is intendedto inform the next generation of AI systems in regard tothe complexity of decisions within the context of learn-ing better play through an understanding of the changingcontextual dependency of decisions. This perspective im-plies that whilst we study populations of players and theirchoices what we have in mind is a single AI that is able tomake choices that are consistent with players of a certainskill.

The paper is laid out in the following manner. We firstintroduce the game of Go along with its basic principlesand how we constructed the game trees of decisions fromdatabases. Then the necessary tools of information theoryare introduced and described. The game trees are thenanalysed in terms of information theory and our principalfindings are presented. Finally we discuss the consequencesof these findings in the context of other research.

2 The game of Go

The game of Go is more than 2 000 years old and stillholds a significant cultural place in many asian countries.Despite a vast array of information and analysis that isavailable to players on almost every aspect of Go strategy,possibly even surpassing that of Chess, the rules of thegame are deceptively simple. There are two players eachof which plays with either black or white stones on a board

Page 2: The aggregate complexity of decisions in the game of Go

556 The European Physical Journal B

Fig. 1. (Color online) Left: the first 20 moves in a game ofGo. Right: Stones played in a 7 × 7 region in the lower rightcorner, the numbers record the order in which they were played(moves 2 to 5 were played elsewhere on the board).

laid out with a 19 × 19 grid. Each player takes it in turnplacing one of their stones on one of the vacant intersectionpoints of the grid lines, see Figure 1 for an example game.Once a stone has been placed on the board it cannot bemoved except if it is ‘captured’ by the other player, inwhich case it is removed from the board. The idea is tosurround as large an area as possible with your stonesusing contiguous stones1. A territory is said to belong toa player once the region is encompassed by contiguousstones. There are different sets of scoring rules that resultin the same outcome (who won or lost) but might differin the number of points scored.

The ranks of Go players start with 30 kyu and in-crease to the first significant rating transition where theygo from 1 kyu to 1 Dan. Rank then increases from 1 Danthrough to 8 Dan. These ranks, from 30 kyu through to8 Dan, are all amateur ranks. There are also professionalranks that go from 1 Dan through to 9 Dan. Typically adedicated player of Go might be able to achieve the rankof 1 Dan amateur after a few years of diligent play. Thevery top amateur ranks might take decades to achieve andmany people never attain them. The professional ranks areawarded after promising young players, typically youngerthan 18 years old, are conferred their professional statusby a professional Go association. There are very few pro-fessionals in the world, in the order of hundreds at anyone time and their dedication to the game is comparableto world class athletes.

A considerable body of work on the psychology ofgames such as chess has been built up over the years,see [8,9] for overviews. These studies have covered skilland expertise in general [10–12], memory [13,14], percep-tion [15], search [16], learning [17] and visual cues [18,19].In contrast very few studies have been carried out forGo but recent work has included the comparison of fMRIimaging for Go and chess [20,21]. Another study by Re-itman [22] examined the psychological phenomenon of

1 Stones are contiguous if they are on adjacent vertices, i.e.the vertices immediately to north, west, south or east from agiven stone, stones that are diagonally adjacent are not con-tiguous.

‘chunking’ [23] in Go showing that there are very spe-cific but overlapping aggregate spatial structures in Go.Similarly, Zobrist [24], Epstein et al. [25] and Bouzy [26]have all argued that spatial structures in Go are im-portant to the reasoning used by Go players and haveimplemented perceptual feature modelling in artificial in-telligence systems for Go. In terms of developmental psy-chology, Ping [27] recently demonstrated the positive ef-fects of Go playing on the development of attention in chil-dren. In a more extensive study of Go players ranging from18 through to 78 years of age, Masunaga and Horn [28]have shown that age related cognitive decline can bemitigated through continuing expertise development inGo players.

By far the most extensive body of academic work forthe game of Go comes from the AI community, for anoverview of earlier work see [29]. Since the computer pro-gram “Deep Blue” defeated Kasporov in 1997 [30] muchof the interest in chess as a leading edge problem in AI re-search has dissipated. In its place Go has emerged as a keytestbed for research in this area [31]. In particular, con-siderable advances have been made using novel algorithmson very fast computers, such as UCT monte-carlo imple-mented by the program MoGo [32,33]. This technique hasbeen extended to include pattern based heuristics to bet-ter refine the search space [34,35] and to spatially organisestrategies in Go [24,26] and more general game environ-ments [25]. Most of these results are based on the intu-ition that spatial structure and patterns are relevant tothe way in which players search for good moves, a notionsupported within the psychological literature (see for ex-ample [36]). An aspect of this spatial organisation whichis missing in the AI literature is how local organisation isimpacted by the global context of the rest of the board.This work contributes in part to understanding this aspectof the development of strong play in Go.

In order to generate sufficient examples of moves, evenfrom large databases of games, the search space necessarilyneeds to be reduced from the whole board down to somesubsection of the board. In this work we focus on the 7×7corner regions of the board where well studied patternsof moves, called Joseki, are played2. Studying the movetrees in this area provides an insight into how these wellunderstood sequences of moves change with skill (all ofthe players in our data-set would be expected to knowsomething of Josekis). As a consequence of the small boardregion and the use of information theory as our analysistool some interesting conclusions can be drawn regardingthe influence and information content of the rest of theboard (Sect. 6.2 discusses these aspects).

3 Game trees for Go

The purpose of an empirical decision tree is to repre-sent a significant proportion of the moves made by hu-man players during realistic game play. In order to build

2 Extensive literature, references and discussions can befound at http://senseis.xmp.net/

Page 3: The aggregate complexity of decisions in the game of Go

M. Harre et al.: The complexity of decisions 557

such decision trees we collected game records for approx-imately 160 000 amateur3 and professional4 players suchthat the two players in each game had the same rank.Then a 7 × 7 corner section of the board was selected asthere are many well known and well studied moves madein this region. Once symmetries had been accounted for,this region makes up more than 4/9 of the total boardarea and constitutes a significant amount of the spatialstructure of the game, particularly in the beginning andmiddle of the game.

Within this region all first moves made were recorded,there were an average of 15 different first moves madeacross all ranks (max. = 25, min. = 8). The frequency ofeach move was then used to construct a probability distri-bution over the move choices. For each first move made,all subsequent second moves played within the region thatwere of the alternative colour were recorded and their fre-quency of occurrence was used to construct a probabilitydistribution over all observed second moves. These prob-abilities were normalised for every first move, so for 20first moves there are 20 normalised distributions for thesubsequent second moves. This process was continued forthe first six moves played within the 7 × 7 region acrossall ranks.

There is a subtlety in that, once multiple stones havebeen played, the order in which these stones were playedis irrelevant, what is important is in which position theyappear on the board. A case where the order is impor-tant is for searching ahead, where the order of plannedmoves might be strategically relevant. In this work we areinterested only in what moves are made given a certainset of stones on the board, not the sequence of movesby which the patterns were arrived at. So there are po-tentially multiple paths to get to the same stone config-urations, these different paths were accounted for in ourstatistics. Within our data we specifically aggregated thestatistics for sequences of paths that resulted in the samepattern of stones on the board, resulting in ‘path indepen-dent’ stone configurations.

4 Information theory and learning

Information theory is used as it provides a unified frame-work within which learning, complexity, uncertainty andinformation storage all have well defined and consis-tent meanings as measured in bits5. First we introduceShannon’s entropy measure for a given probability dis-tribution. For a random variable x that may result in npossible outcomes xi ∈ {x1, . . . xn} a probability distribu-tion over outcomes is p(x = xi) ≡ p(xi) and the amountof information associated with the distribution (in bits) is

3 Amateur game records were collected from the Internet GoServer: www.pandanet.co.jp

4 Professional game records are from the GoGoD collection:www.gogod.co.uk

5 We base all of our calculations on log base 2, so measure-ments are always bits.

given by the entropy [37]:

H(p) = −∑

i

p(xi) log2(p(xi)). (1)

Hereafter we drop the subscript 2 on the logs. The entropyis the expected value of − log(p(xi)), the amount of infor-mation associated with the single outcome xi. H(p) canbe thought of in two complementary ways: as the amountof uncertainty an observer has in the outcome of a ran-dom event before the event occurs or alternatively as theaverage amount of information an observer gains by hav-ing witnessed the outcome. For a probability distributionp over n (finite) discrete elements the entropy is boundedabove and below: 0 ≤ H(p) ≤ H(p(xu)) where p(xu) isthe uniform distribution. H(p) = 0 tells us that one andonly one outcome ever occurs so there is zero uncertaintyin the outcome before it occurs and as the outcome isa certainty no information is gained by having observedthat outcome. The entropy of the uniform distribution,H(p(xu)), is the distribution whereby each element is notstatistically differentiable from any other element. In thiscase the uncertainty in what will occur next is a maximumand therefore the information gained having observed anoutcome is a maximum as well.

The entropy is also the minimum amount of informa-tion required to loss-lessly compress a probabilistic set ofelements [37,38]. In particular, when storing or transmit-ting data, all regularities in the data (i.e. all lack of unifor-mity) can be accounted for without loss by using the num-ber of bits given by the entropy. This led to an alternativeinterpretation of entropy by Grunwald in the context oflearning: “[I]f one has learned something of interest, onehas implicitly compressed the data.” ([39], p. 595). ‘Inter-est’ here means ‘the regularities in the data’. That is tosay, the lower the entropy the more regularities that haveimplicitly been extracted from the data.

There is also a strong connection between the entropyand the Kolmogorov complexity of a distribution [38] andentropy can often be taken as a proxy for the Kolmogorovcomplexity [40]6. We consider two separate and distinctdistributions p1 and p2 over two different sets of elementsof the same size, i.e. the distributions and their supportshave nothing in common except the number elements. Ifthe entropies have the relationship H(p1) < H(p2) thendistribution p2 might be considered more ‘complex’ as thisdistribution has extracted less patterned information fromits underlying set of elements than p1 has from its under-lying set of elements. While this provides a certain use-ful characterisation of complexity it has been noted thatsuch measures only highlight the difference between deter-ministic and random distributions [41]. However for anystochastic decision process (as opposed to trying to find anobjective measure of a system’s structure, for example),it is readily seen that it is more difficult (and therebyarguably more ‘complex’) to choose from a perfectly uni-form distribution of options (as all choices are statistically

6 This is useful as the Kolmogorov complexity is not a com-putable function.

Page 4: The aggregate complexity of decisions in the game of Go

558 The European Physical Journal B

indistinguishable from each other) than it is for a distri-bution where one option occurs 100% of the time (whereone choice is perfectly distinguishable from all others). Foran overview of the use of complexity measures and theirrelationship to deterministic systems, see [42].

Mutual information is an extension of entropy wherebythe goal is to measure the joint information shared bytwo distributions. In this work we measure how muchinformation is gained about the next choice of movegiven the current (local) state of the board. The first imoves already played on the board, of which there arek unique variations, are denoted by the set {x1, . . . , xi}j

for i ∈ {1, . . . , 5} and the index term j ∈ {1, . . . , k}.The (marginal) probability that the jth unique patternof i stones on the board occurs is p({x1, . . . , xi}j). The(marginal) probability that the lth unique move xl

i+1 everoccurs at move i + 1 is p(xl

i+1) and the (joint) probabil-ity that the sequence {x1, . . . , xi}j is followed by xl

i+1 isp({x1, . . . , xi}k, xl

i+1). xli+1 represents all moves observed

for move i + 1, irrespective of the stones already played.In this sense H(p(xi)) is the unconditional entropy of theprobability of all variations of the ith move. The mutualinformation between the stones already on the board andthe next stone placed on the board is [43]:

I({x1, . . . , xi}; xi+1) =

k,l

p({x1, . . . , xi}k, xli+1) log

(p({x1, . . . , xi}k, xl

i+1)p({x1, . . . , xi}k)p(xl

i+1)

).

(2)

For example, in the case of how well move 1 predictsmove 2, this equation reduces to the much simpler form:

I(x1; x2) =∑

k,l

p(xk1 , xl

2) log(

p(xk1 , xl

2)p(xk

1)p(xl2)

). (3)

This explicitly calculates how predictable move two isbased on move one: if I(x1; x2) = 0 then move two isindependent of move one and if I(x1; x2) = H(p(x1))then move two is entirely decided by the choice of moveone. Generally the entropy measures a distribution’s uni-formity whereas mutual information measures the rela-tive dependency of two distributions. In the case of in-dependent distributions p(xk

1 , xl2) = p(xk

1)p(xl2) we have

I(x1; x2) = 0. On the other hand, equation (3) canbe rewritten as I(x1; x2) = H(p(x1)) + H(p(x2)) −H(p(x1), p(x2)). The H(p(xi)) terms are simply entropiesas in equation (1). The

H(p(x1), p(x2)) = −∑

i,j

p(xi1, x

j2) log(p(xi

1, xj2))

term is the joint entropy. For any random variables x andy, H(p(x), p(y)) ≥ H(p(x)) where equality holds if andonly if the outcome y is a deterministic function of x.

5 Results

In this section we present the principal findings of ourwork. First the entropy and then the mutual information

results are outlined. In the last section we discuss theseresults and place them in the context of previous work.

5.1 Entropy

Figure 2 plots the cumulative average entropies for thefirst six moves by player ranks with the best fit lineartrend added for the amateurs and professionals. The lineartrends can be plotted as a function of a ‘continuous’ rankindex7 ρ with equations (a for amateur, p for professional):Ha(ρ) = −0.1806ρ+7.2322 and Hp(ρ) = 0.0070ρ+5.7222.There is a distinct downward linear trend for the ama-teur players as their rank increases. This is not the casefor the professionals though; the average cumulative moveentropy across all player ranks is almost flat. Note that re-liable statistics were not achievable for some of the movesin our data, specifically the senior amateurs and juniorprofessionals, due to scarce data. So in the case of cumu-lative plots (Figs. 2 and 3) these data points were omittedas they would not make sense. On the other hand it is pos-sible to include some of the data points for some movesin the other non-cumulative plots (Figs. 4 and 5 are notcumulative) and doing so enables a better estimation ofthe inflection points.

In order to understand the components of these trendsbetter, Figure 3 plots the individual entropies for eachmove. Note that the first three moves have distinguish-ably higher average entropies than the second three forboth the amateurs and the professionals. The best linearapproximations have been included. For the amateurs, thegradient of the linear trend for first three moves is −0.0133(r2 = 0.441) and for the second three moves is −0.0469(r2 = 0.890). For the professionals, the gradient of the lin-ear trend for the first three moves is −0.0097 (r2 = 0.204)and for the second three is 0.012 (r2 = 0.213). Some rankswere excluded as accurate move entropies could not becalculated due to a paucity of data, and so cumulative en-tropies would make no sense, see [44] for the source of ourerror analysis.

5.2 Mutual information

Using equation (2), the mutual information between thedistribution of each unique sequence of moves and the dis-tribution of all possible next moves were calculated. Theresults are plotted in Figure 4. Using the variable ρ forrank as used for the entropies, the best fit quadratic in-terpolation of the mutual information between successivemoves are:

I(x1; x2, ρ) = −0.0026ρ2 + 0.0587ρ + 0.7445

I({x1, x2}; x3, ρ) = −0.0029ρ2 + 0.0657ρ + 1.0317

I({x1, . . . , x3}; x4, ρ) = −0.0037ρ2 + 0.0765ρ + 2.0085

I({x1, . . . , x4}; x5, ρ) = −0.0039ρ2 + 0.0792ρ + 2.3886

I({x1, . . . , x5}; x6, ρ) = −0.0037ρ2 + 0.0794ρ + 2.63197 For this purpose we set the following integer values for ρ

(rank): am2q = 1, am1q = 2, . . . , pr9d = 19.

Page 5: The aggregate complexity of decisions in the game of Go

M. Harre et al.: The complexity of decisions 559

Fig. 2. (Color online) A plot of the cumulative entropies of the decision tree. The linear trend for the amateur is negative(r2 = 0.95911), however the professional linear trend has a low r2 value (=0.01588, reflecting the near zero gradient). Errorbars are ±2σ and both linear approximations lie within ±2σ of the observed total entropies for all ranks.

and the respective residual errors (r2 terms) for theseequations are, from top to bottom: 0.65791, 0.73576,0.68054, 0.81934, 0.7142. Here we have included all ranksof players. In this case the mutual information curves arenot cumulative as in the entropies, but we are also inter-ested in where the inflection points are in the quadraticsused to fit the data.

Using these curves as continuous approximations tothe discrete data in the variable ρ, we want to find theinflection point where the mutual information peaks asρ increases. To do so we differentiate the five quadraticslisted above with respect to ρ and solve for where thisdifferential is zero for each of the curves. The mean inflec-tion point in the mutual information is at ρ = 10.77 (min.= 10.15, max. = 11.34), i.e. slightly more than half-waybetween 8 Dan amateur (ρ = 10) and 1 Dan professional(ρ = 11). This suggests a fundamental change in the na-ture of skill development as players turn professional.

From Figure 4 it is not possible to see to what ex-tent the plotted mutual information reflects the theoreti-cal maximum possible values. Recall that mutual infor-mation is strictly bounded: 0 ≤ I({x1 . . . xi}; xi+1) ≤H(p({x1 . . . xi})), and it is of interest to what extent Fig-ure 4 reflects this range. In order to do so, Figure 5 plots

I({x1...xi};xi+1)H(p({x1...xi})) × 100% = percentage of the theoretical

maximum predictability for Figure 4.

6 Discussion

Developers of future AI systems, if their systems are toemulate the cognitive abilities of humans in complex tasks,need to be as well informed as possible regarding the na-ture and evolution of human behaviour as they acquire theskill AI systems hope to emulate. This study has sought toquantify some of the behavioural attributes of complex de-cision tasks in terms of information theory, an approachwhich enables measurements of both human and artifi-cial systems such that direct comparisons might be made.Currently ‘brute force’ techniques, such as monte-carlo al-gorithms and their recent variations [34,35,45], have beenable to play Go to a strong amateur level. It is a little be-low this strong amateur level at which this study beginsand includes the very best players in the world, highlight-ing some of the subtleties with which AI algorithms needto contend in order to perform beyond their current lev-els. This section discusses the important conclusions thatmay be drawn from this work and places them within thecontext of previous results.

Page 6: The aggregate complexity of decisions in the game of Go

560 The European Physical Journal B

Fig. 3. The component parts of the entropies of Figure 2 and the linear trends for the first three moves and last three moves.These curves, like Figure 2, are averages across all branches at a given move number. For example there is only one branch formove one, but there is an average of 15 branches for each rank at move two that have been averaged in order to get the plottedvalues.

Fig. 4. Mutual information between successive moves.

Page 7: The aggregate complexity of decisions in the game of Go

M. Harre et al.: The complexity of decisions 561

Fig. 5. The ‘predictability’ of each move as a percentage of the theoretical maximum of mutual information, e.g. pred(3|2, 1)is the average predictability of move 3 across all possible variations of moves 1 and 2. From bottom to top, the order of thecurves is: pred(3, |2, 1), pred(4|3, 2, 1), pred(5|4, 3, 2, 1), pred(6|5, 4, 3, 2, 1), pred(2|1).

6.1 Learning in a complex task environment

A prominent theoretical framework used for learning thebest choice in a given task is reinforcement learning [46].In this approach a record of past rewards for given actionsis kept and used to inform future decisions. This approachhas not only been successful within the the AI community,but it has recently shown considerable explanatory powerin neuroscience [47,48]. Within this paradigm the notion ofexploration versus exploitation is used extensively: explo-ration is achieved by choosing more randomly between theavailable options and exploitation is achieved by choosingbased on historically more rewarding options.

A modification of this approach uses a meta-parameterin order to control the degree to which of these two strate-gies are favoured: exploration of the alternatives or ex-ploiting the historically better options [49,50]. In suchmodels where a meta-parameter is a controllable variable,typically once an exploration phase has informed a rein-forcement learning algorithm of the better options, thenan exploitation phase is adopted where knowledge gainedin the exploration of the space is used to make betterdecisions, consequently improving performance. Alterna-tively it has been suggested that learning is the processby which regularities are extracted from data, and conse-quently learning is data compression [39]. From this pointof view entropy is a measure of how much has been learnedby the players.

In these terms, we see in Figure 2 a decreasing entropyfor amateur players as rank increases. This strongly sug-gests that the uncertainty in move selection is decreasingbecause players are taking advantage of the better choicesof moves as they learned through past experiences of thegame. Alternatively, the players are able to ‘compress’ thedata by extracting more information from past data andhave thereby learned more from the data. This tendencyis more prominent in the last three moves than the firstthree, Figure 3, possibly because the first three moves arelearned more quickly than the next three moves. This ef-fect has mostly vanished in the cumulative entropies bythe time the players become professionals, although curi-ously this seems to have been caused by a balancing ofa slight increase in entropy for the first three moves andslight decrease in entropy for the next three moves.

It might be thought that the decrease in entropy be-tween successive moves, particularly moves 3 and 4, is dueto the decrease in options available to the players as thelocal region of the board fills up with stones. This effectwould only be slight when considering 6 stones out of apossible 49 positions, and the very sharp decrease betweenmoves 3 and 4 cannot be explained this way as we shouldexpect entropy to decrease smoothly as the local positionsfill with stones. Also note that while the first 3 moves havedifferent entropies from the last 3, there is no consistentorder within each group of 3 moves as would be expectedif the effect was due to decreasing move options.

Page 8: The aggregate complexity of decisions in the game of Go

562 The European Physical Journal B

6.2 Context dependency of move choices

In order to quantify the relationship between prior movesmade and the next move, we measured the mutual infor-mation between the probability of a certain sequence ofstones being played and the next choice of move (Fig. 4).This measures how predictable the next move is based onthe probability of a prior pattern of stones having beenplaced on the board.

We observed a considerable difference in the degree ofpredictability from one move to the next. For example thefirst move typically shares about 0.9 to 1.1 bits of informa-tion with the second move, equating to between 60% and80% of the theoretical maximum possible (Fig. 5). Thistells us that there is very little other information, such asthe rest of the board, being used to decide what the nextmove will be, almost all of the uncertainty in the secondmove is explained by the uncertainty in the first move.This is not consistent across each successive move though.The choice of move three is considerably less predictablebased on the uncertainty in the first two moves. The to-tal predictability increases because two stones are moreinformative than one8. However the increase in shared in-formation provided by this second stone is typically onlyabout 0.1 to 0.3 bits (the difference between I(x1; x2) andI({x1, x2}; x3)). This is shown in Figure 5 as ranging be-tween approximately 35% and 55% of the theoretical max-imum (the bottom most curve in this plot), a significantdrop compared to how informative the first move is of thesecond move.

We suggest that these significant and consistent dif-ferences between the predictability of one move given theuncertainty in the previous moves is due to the differentfocus the players have on the local versus the global con-text of board strategy. In choosing the second move, mostof the information used by the players is based on thestone already placed. In choosing the third stone to play,most of the information used by the players is based onthe rest of the board, i.e. the non-local aspect of the de-cision. In this sense the rest of the board acts to induce adegree of uncertainty in which move to make next. Thischanging uncertainty, and hence changing emphasis on lo-cal versus global context, is expressed in how dependentthe next move is on the stones that are played locally.

6.3 Expertise acquisition and the ‘Inverted-U’ effect

We make a final comment regarding the striking and con-sistent nature of the “inverted-U” observed as a functionof player rank for the mutual information measures, Fig-ure 4. The inverted-U property of novice-skilled-expertcomparisons have previously been reported in the lit-erature [51–54]. Here we have been able to show that,across a significant portion of the range of player skills,

8 This is not always the case, for example in Figure 4 for pr4dthe first data point sits higher than the second data point. Thisis due to there being multiple paths to the same local patternof stones.

the inverted-U property holds for the predictability of thenext move in Go. In order to explain this effect we breakit into two components, the increasing branch (prior toplayers turning professional) and the decreasing branch(after players turn professional).

Prior to turning professional, Figure 2 shows a de-creasing entropy and Figure 4 shows an increase in pre-dictability. This is consistent with the notion of reinforce-ment learning with meta-parameters discussed earlier, theentropy decreases due to choosing strategies that haveproven to have good outcomes in the past, and whichchoice will be made is informed by the local pattern ofstones thereby increasing the mutual information.

After turning professional this no longer holds, theplayers have likely minimised the entropy for these movesas far as they possibly can within the local region. Howevernow the predictability of the next move starts to decreasefor a fixed level of entropy. This suggests that professionalplayers are now learning to obfuscate their moves by play-ing less predictably within the local region and using moreinformation from the rest of the board i.e. they are in-creasing their global strategic awareness in order to playmore subtle local variations. Importantly this is achievedwithout significant variation in the marginal entropies, sothere is no change in the exploitation-exploration balanceor equivalently there is no variation in the amount of in-formation the players have extracted from the data, thechange in behaviour is of a qualitatively different natureto that which has been observed previously for expertisein complex task environments.

6.4 Conclusion

This work has aimed at studying the complexity of thedecisions made by many thousands of Go players as theirskill progresses towards that of the best players in theworld. Using large databases of game records we havebeen able to show that there are significant behaviouralpatterns, some of which support previous work in bothpsychological and artificial intelligence research and oth-ers which are entirely new to both fields. Most signifi-cantly we have been able to show that the strategic devel-opment of player’s behaviour at the very top level of play,the level that has not yet been conquered by brute forcesearch techniques, is not as simple as a trade-off betweenexploitation-exploration or the compressibility of relevantinformation in the underlying decision task.

This work was supported by ARC grant number DP0881829and USAF grant number 094094.

References

1. D. Stern, T. Graepel, D. MacKay, Advances in neural in-formation processing 16, 33 (2004)

Page 9: The aggregate complexity of decisions in the game of Go

M. Harre et al.: The complexity of decisions 563

2. S. Sanner, T. Graepel, R. Herbrich, T. Minka, LearningCRFs with Hierarchical Features: An Application to Go,in International Conference on Machine Learning (ICML)Workshop (2007)

3. J. Lafferty, A. McCallum, F. Pereira, Conditional RandomFields: Probabilistic Models for Segmenting and LabelingSequence Data, in Machine Learning – InternationalWorkshop then Conference (2001), pp. 282–289

4. T. Cazenave, Advances in Computer Games 9, 275 (2001)5. D. Stern, R. Herbrich, T. Graepel, Bayesian pattern rank-

ing for move prediction in the game of Go, in Proc. 23rdint. conf. on machine learning (2006), Vol. 148, pp. 873–880

6. T. Graepel, M. Goutrie, M. Kruger, R. Herbrich, Learningon graphs in the game of Go, Artificial Neural Networks –ICANN (2001), pp. 347–352

7. E. Berlekamp, D. Wolfe, Mathematical Go: Chilling Getsthe Last Point, edited by A.K. Peters (1997)

8. N. Charness, Psychol. Res. 54, 4 (1992)9. F. Gobet, A. de Voogt, J. Retschitzki, Moves in mind: The

psychology of board games (Psychology Pr, 2004)10. H.A. Simon, W.G. Chase, Am. Sci. 61, 393 (1973)11. D. Holding, The psychology of chess skill (L. Erlbaum

Assoc., Hillsdale, NJ, 1985)12. D. Holding, Psychol. Res. 54, 10 (1992)13. H. Simon, K. Gilmartin, Cogn. Psychol. 5, 29 (1973)14. F. Gobet, H. Simon, Recall of rapidly presented random

chess positions is a function of skill (1996)15. W.G. Chase, H.A. Simon, Cogn. Psychol. 4, 55 (1973)16. N. Charness, Journal of Experimental Psychology: Human

Perception and Performance 7, 467 (1981)17. A. Cleveland, The American Journal of Psychology 18,

269 (1907)18. E. Reingold, N. Charness, M. Pomplun, D. Stampe,

Psychol. Sci. 12, 48 (2001)19. A. Waters, F. Gobet, G. Leyden, British Journal of

Psychology 93, 557 (2002)20. M. Atherton, J. Zhuang, W. Bart, X. Hu, S. He, Cognitive

Brain Research 16, 26 (2003)21. X. Chen, D. Zhang, X. Zhang, Z. Li, X. Meng, S. He, X.

Hu, Cognitive Brain Research 16, 32 (2003)22. J. Reitman, Cogn. Psychol. 8, 336 (1976)23. F. Gobet, P. Lane, S. Croker, P. Cheng, G. Jones, I. Oliver,

J. Pine, Trends in Cognitive Sciences 5, 236 (2001)24. A. Zobrist, A model of visual organization for the game

of Go, in Proceedings of the May 14-16, 1969, spring jointcomputer conference (ACM, 1969), pp. 103–112

25. S. Epstein, J. Gelfand, E. Lock, Constraints 3, 239 (1998)26. B. Bouzy, Spatial Reasoning in the game of Go, in

Workshop on Representations and Processes in Vision andNatural Language, ECAI (Citeseer, 1996), pp. 78–80

27. X. Ping, K. Keqing, Psychological Research, 03 (2009)28. H. Masunaga, J. Horn, Learning and individual differences

12, 5 (2000)29. X. Cai, D. Wunsch, Computer Go: A grand challenge to AI,

in Challenges for Computational Iintelligence (SpringerBerlin, 2007), pp. 443–465

30. M.S. Campbell, A.J. Hoane, F. Hsu, Search control meth-ods in Deep Blue, AAAI Spring Symposium on SearchTechniques for Problem Solving under Uncertainty andIncomplete Information, pp. 19–23 (1999)

31. J. Burmeister, J. Wiles, The challenge of Go as a do-main for AI research: A comparison between Go and chess,in Proceedings of the Third Australian and New ZealandConference on Intelligent Information Systems (1995)

32. S. Gelly, D. Silver, Achieving master level play in 9 × 9computer go, in Proceedings of AAAI (2008), pp. 1537–1540

33. C. Lee, M. Wang, G. Chaslot, J. Hoock, A. Rimmel,O. Teytaud, S. Tsai, S. Hsu, T. Hong, ComputationalIntelligence and AI in Games, IEEE Transactions on 1,73 (2009)

34. T. Cazenave, B. Helmstetter, Combining Tactical Searchand Monte-Carlo in the Game of Go, in Proceedings of theIEEE Symposium on Computational Intelligence in Games(2005), pp. 171–175

35. S. Gelly, Y. Wang, R. Munos, O. Teytaud, Modification ofUCT with Patterns in Monte-Carlo Go (2006)

36. F. Gobet, H. Simon, Psychol. Res. 61, 204 (1998)37. C.E. Shannon, Bell Syst. Tech. J. 27, 379 (1948)38. T. Cover, J. Thomas, Elements of Information Theory, 2nd

edn. (Wiley, 2006)39. P. Grunwald, The Minimum Description Length Principle

(The MIT Press, 2007)40. S. Leung-Yan-Cheong, T. Cover, IEEE Trans. Inf. Theory

24, 331 (2002)41. J. Crutchfield, K. Young, Phys. Rev. Lett. 63, 105 (1989)42. G. Boffetta, M. Cencini, M. Falcioni, A. Vulpiani, Phys.

Rep. 356, 367 (2002)43. D. Mackay, Information theory, inference, and learning al-

gorithms (Cambridge University Press, New York, 2003)44. M. Roulston, Physica D: Nonlinear Phenomena 125, 285

(1999)45. Y. Wang, S. Gelly, Modifications of UCT and sequence-

like simulations for Monte-Carlo Go, in Proceedings ofthe IEEE Symposium on Computational Intelligence andGames (2007), pp. 171–182

46. R. Sutton, A. Barto, Reinforcement Learning: AnIntroduction (The MIT Press, 1998)

47. M. Frank, A. Moustafa, H. Haughey, T. Curran, K.Hutchison, Proceedings of the National Academy ofSciences 104, 16311 (2007)

48. M. Cohen, C. Ranganath, J. Neurosci. 27, 371 (2007)49. S. Ishii, W. Yoshida, J. Yoshimoto, Neural Networks 15,

665 (2002)50. N. Schweighofer, K. Doya, Neural Networks 16, 5 (2003)51. F. Gobet, H. Simon, Mem. Cogn. 24, 493 (1996)52. M. Van De Wiel, H. Boshuizen, H. Schmidt, Eur. J. Cogn.

Psychol. 12, 323 (2000)53. H. Boshuizen, R. Bromme, H. Gruber, Professional learn-

ing: Gaps and transitions on the way from novice to expert(Kluwer Academic Publishers, 2004)

54. I. Davies, P. Green, M. Rosemann, M. Indulska, S. Gallo,Data and Knowledge Engineering 58, 358 (2006),