from: aaai technical report fs-93-02. compilation ... · pdf filemetric chess-like game is a...

9

Click here to load reader

Upload: lyngoc

Post on 16-Feb-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

A Strategic Metagame Player for General Chess-Like Games

Barney Pell*

Computer Laboratory

University of CambridgeCambridge, CB2 3QG, UK

b dp((~cl c am. ac

Abstract

This paper reviews the concept of Metagame anddiscusses the implelnentation of METAGAMER, aprogram which plays Metagame in the class ofsymmetric chess-like games, which includes chess,Chinese-chess, checkers, draughts, and Shogi. Thel)rogram takes as input the rules of a specific gameand analyses those rules to construct for that gamean efficient representation and an evaluation func-tion, both for use with a generic search engine. Thestrategic analysis performed by the program relatesa set of general knowledge sources to the details ofthe particular game. Among other properties, thisanalysis determines the relative value of the differ-eat pieces in a given game. Although METACAMERdoes not learn from experience, the values resultingfrom its analysis are qualitatively similm- to valuesused by experts on known games, and are suff~cientto produce competitive performance the first timethe program actually plays each game it is given.This appears to be tim first program to have de-.rived useful piece values directly from analysis ofthe rules of different games.

1 The Problem

Virtually all past research in computer game-, playinghas attempted to develop computer programs whichcould play existing games at a reasonable standard.While some researchers consider the development of agame-specific expert for some gmne to be a sufficientend in itself, many scientists in AI are motivated by adesire for generality. Their emphasis is not on achievingstrong performance on a particular game, but ratheron understanding the general ability to produce suchstrength on a wider variety of games (or problems ingeneral). Hence additional evaluation criteria are typi-cally placed on the playing programs beyond mere per-formance in competition: criteria intended to ensurethat methods used to achieve strength on a specific

"Parts of this work have been supported by RIACS,NASA Ames Research Center [FIA], the American Friendsof Cambridge University, Trinity College (Cambridge), andthe Cambridge University Computer Laboratory.

game will transfer also to new games. Such criteria in-clude the use of learning and planning, and the abilityto play more than one game.

1.1 Bias when using Existing Games

However, even this generality-oriented research is sub-ject to a potential methodological bias. As the humanresearchers know at the time of program-developmentwhich specific game or games the program will be testedon, it is possible that they import the results of theirown understanding of the game dh’ectly into their pro-gn’am. In this case, it becomes difficult to determinewhether the subsequent performance of the programis due to the general theory it implements, or merelyto the insightful observations of its developer about thechm’acteristics necessary for strong performance on thisparticular game. An instance of this problem is thefixed representation trick [Flann and Dietterich, 1989],in which many developers of learning systems spendmuch of their time finding a representation of the gamewhich will allow their systems to learn how to play itwell.

This problem is seen more easily when computergame-playing with known games is viewed schemati-cally, as in Figure 1. Here, the human researcher orprogrammer is aware of the rules and specific knowledgefor the game to be programmed, as well as the resourcebounds within which the program must play. Giventhis information, the human then constructs a playingprogram to play that game (or at least an acceptableencoding of the rules if the program is already fairlygeneral, like HOYLE [Epstein, 1989b]). The programthen plays in competition, and is modified based on theoutcome of this experience, either by the human, or per-haps by itself in the case of experience, based learningprograms. In all cases, what is significant about thispicture is that the human stands in the centre, and me-diates the relation between the program and the gameit plays.

1.2 MetagameThe preceding discussion only serves to emphasize apoint commonly acknowledged in AI: whenever we de-.sign a program to solve a set of problems, we build inour own bias about how the program should go aboutsolving them. However, while we cannot eliminate bias

148

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 2: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

©

1

Figure 1: Computer Game-Playing with existinggalne~: tile human programmer mediates the relationbetween the program and the game it plays.

totally from AI research (or ally other for that matter),we call to some degree contwl for human bias, and thuseliminate a degree of it.

This motivates tile idea of Meta-Game Playing [Pell,1992a], shown schematically in Figure 2. Rather thandesigning programs to play an existing game known inadvance, we desigu programs to play (or themselvesproduce other programs to play) new games, from awell-defined class, taking as input only the rules of thegames as output by an automatic game generator. Asonly the class of games is known in advance, a degree ofhuman bias is eliminated, and playing programs are r~.quired to perform any game. specific optimisations with-out lmman e, ssistance. In contrast with the discussionon existing games above, tlle human no longer medi-ates tile relation between the program and the gamesit plays, instead she mediates the relation between theprogram and the class of games it plays.1

1.3 SCL-Metagame

SCL-Metagame [Pell, 1992b] is a concrete Metagameresearch problem based around the class of symmet-ric chess-like games. The class includes the games ofchess, checkers, noughts and crosses, Chinese-chess, andShogi. An implemented game generator produces newgames in this class, some of which are objects of interestin their own right.

Symmetric Chess-Like Games Informally, a sym-metric chess-like game is a two-player game of perfectinformation, ill which the two players move pieces along

l lnstead of playing a game, the prog-ram now playsa game about game-playing, hence the name meta-gameplaying.

Figure 2: Metagame-playing with new games.

specified directions, across rectangular boards. Differ-ent pieces have different powers of movement, c~apture,and promotion, and interact with other pieces basedon ownership and piece type. Goals involve eliminat-ing certain types of pieces (eradicate goals), driving aplayer out of moves (stalemate goals), or getting celiainpieces to occupy specific squares (arrival goals). Mostimportantly, the games are symmetric between the twoplayers, in that all the rules can be presented from theperspective of one player only, and the differences ingoals and movements are solely determined by the di-rection from which the different players view the board.

As an illustration of how chess-like games are definedin this class, Figure 3 presents a ~’ammatical represen-tation of the complete rules for American Checkers asa symmetric chess-like game.

Game Generation The goal of game generation, asshown in Figure 4, is to produce a wide variety of games,all of which fall in the same class of games as describedby a grammar. We also want to be able to modifythe distribution of games generated by changing pa-rameters, often in the form of pwbabilities. Finally, thecombination of grammar and probabilities may not beenough to constrain the class according to our needs,in which case the generator should be able to handle aset of constraints.

The game generator for this class of games is illus-trated schematically in Figure 4. The generator beginsby generating a board and a set of piece names, andthen generates goals and definitions for the pieces sub-ject to the grammar and constraints, making choices us-ing probabilities (parameters) attached to each choice-point in the grammar. Details of the generator, its pa-ralneters, and example generated games are providedin [Pell, 1992b].

With the provision of the class definition and genera-

149

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 3: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

G A M E american_checkersGOALS stalemate opponentBOARD_SIZE 8 BY 8BOARD_TYPE planarPROMOTE_RANK 8SETUP man AT {(1,1) (3,1) (5,1) (7,1) (2,2)

(6,2) (8,2) (1,3) (3,3) (5,3) CONSTRAINTS must_capture

DEFINE manMOVING

MOVEMENTLEAP (1, 1) SYMMETRY {side}

END MOVEMENTEND MOVINCCAPTURING

CAPTUREBY {hop}TYPE [{opponent} any_piece]EFFECT removeMOVEMENT

HOP BEFORE [X = 0]OVER IX = IIAFTER IX = 0]

HOP_OVER [{opponent} any_piece](1, l) SYMMETRY {side}

END MOVEMENTEND CAPTURE

END CAPTUPdNGPROMOTING

PROMOTE_TO kingEND PROMOTINGCONSTRAINTS continue_captures

END DEFINE

DEFINE kingMOVING

MOVEMENTLEAP (1, I} SYMMETRY {forward side}

END MOVEMENTEND MOVINGCAPTURING

CAPTUREBY {hop}TYPE [{opponent} any_piece]EFFECT removeMOVEMENT

HOP BEFORE IX = 0]OVER [x= IIAFTER [X = O]

HOP_OVER [{opponent} any_piece](1, 1} SYMMETRY {forward side}

END MOVEMENTEND CAPTLIRE

END CAPTURINGCONSTRAINTS continue_captures

END DEFINE

END GAME.

Figure 3: Definition of American Checkers as a sym-metric chess-like game.

Generator

Figure 4: Components of a Game Generator.

tor (as well as various communications protocols to en-able communication between programs, also discussedin [Pell, 1992b]), the problem of Metagame in symmet-ric chess-like games is thus fully defined. The rest ofthis paper is organised as follows. Section 2 discussesthe challenge of representing knowledge for unknowngames and the construction of METAGAMER. Section 3discusses the analysis performed by METAGAMER to de-termine piece values for the games of chess and checkers,when presented with only the rules of those games, andcompares this to previous work on automatic methodsfor determining feature values in games. Finally, Sec-tion 4 presents the results of preliminary evaluations ofMETAGAMER on the games of chess and checkers.

2 Constructing a Metagame-player

While a main intention of Metagame is to serve as atest-bed for learning and planning systems, a logicalfirst approach to constructing a Metagame-player is topush the limits of what can be done with standardgame-tre~, search and knowledge-engineering. At thevery least such an engineered player could serve as abaseline against which to measure the performance ofmore general systems,

2.1 Search Engine

To this end, the search engine used incorporates manystandard search techniques from the game-playing lit-erature (see [Levy and Newborn, 1991]). It is based the minima~ algorithm with alpha-beta p~]ming, itera-tire deepening, and the principal continuation heuristic.More details of the Metagame search engine are givenin [Pell, 1993b].

150

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 4: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

2.2 Automated Efficiency Optimisation

This powerfill search engine should allow a playing pro-gram to search deeply. However, search speed is linkedto the speed with which the primitive search opera-tions of move. generation and goal detection can be per-forlned. For a game-specific program, these issues canbe easily hand-engineered, but for a program which isto play an entire class of games, the excess overheadof such generality initially caused the search to be un-bearably slow. This problem was overcome by using aknowledgc-compilation approach. We represent the se-mantics of the class of games in an extremely generalbut inefficient manner, and after receiving the rules of agiven game the program automatically paFtially evalu-ates the general theory with respect to it, thus compil-ing away both unnecessary conditionality and the over-head of interpretation. This is discussed in detail in[Pell, 1993@

2.3 Meta-Level Evaluation Function

With the search engine in place, using the optimisedprimitive operations, we have a program which canse.arch as deeply as resources permit, in any positionin any game in this class. The remaining task is to de-velop an evaluation filnction which will be useful acrossmany known and unknown games.

Following the approach used in HOYLE [Epstein,1989b], we view each feature as an advisor, which en-caI)sulates a piece of advice about why some aspect ofa position may be favourable or unfavourable to one ofthe players. But as the class of games to be played isdifferent fi’om that played by Epstein’s HOYLE, we hadto construct our advisors mostly from scratch.

In terms of the representation of the advisors, we fol-low an approach similar to that used in Zenith [Faweettand Utgoff, 1992], in which each advisor is defined by anon-deterministic rule for assigning additional value toa position. The total contribution (value) of the advisoris the sum of the values for each solution of the rule.This method of representation is extremely general andflexible, and facilitates the entry and modification ofknowledge sources. We have also tried when possibleto derive advisors manually following the transforma-tions used by Zelfith [Fawcett and Utgoff, 1992] andCINDI [Callan and Utgoff, 1991], two systems designedfor automatic feature generation.

2.4 Advisors

This section briefly explains the advisors currently im-plemented for METAGAMER. It should be noted thatthis set is not final, and there are several importantgeneral heuristics which are not yet incorporated (suchas distance and contwl [Snyder, 1993]).

The advisors can be categorised into four groups,based on the general concept fi’om which they derive.

Mobility Advisors The first group is concernedwith different indicators of mobility. These advisorswere iimpil-ed in part by [Church and Church, 1979] and[Botvinnik, 1970], and by generalising features used forgame-, specific l)rograms [Pell, 1993b].

¯ dynamic-mobility:counts the number of squares towhich a piece can move directly from its current squareon the current board, using a moving power.2

¯ static-mobility:a static version of immediate-mobility, this counts the number of squares to whicha piece could move directly from its current square onan otherwise empty board, using a moving power.

¯ capturing-mobility : counts the number of captureseach piece could make in the current position, regard-less of the usefulness of the capture. It does not evendistinguish whether the victim is a friendly or enemypiece. There is no static version of this advisor. Forfuture work, one way to approximate the potential cap-turing ability of a piece might be to play random gamesand count the pieces attacked by each piece in each po-sition.

¯ eventual-mobility:measures the total value of allsquares to which a piece could move eventually from itscurrent square on an otherwise empty board, using amoving power. The value of each square decreases (bya parameter-controlled function) with the number ofmoves required for the piece to get there. Thus while abishop has 32 eventual moves and a knight has 64 fromany square, the bishop can reach most of its squaresmore quickly, a fact captured by this advisor.

Threats and Capturing The second group of ad-visors deals with capturing interactions (in general,threats and conditions enabling threats):¯ local-threat:for each target piece which a piececould capture in the current position, this measures thevalue of executing this threat (based on the other ad-visors), but reduces this value based on which player isto move. Thus a threat in a position where a playeris to move is almost as valuable for that player as thevalue derived fi’om executing it (i.e. he can capture if helikes), but if it is the opponent’s move the threat is lessvaluable, as it is less likely to happen. Thus attackingan enemy piece while leaving one’s own piece attacked(if the pieces are of equal value) is a losing proposition,but if the threatened enemy piece is worth much morethis may be a good idea. The value of these threats arealso augmented based on the effect of the capture.¯ potent-threat:this extracts from the local-threatanalysis just those threats which are obviously potent.A threat is potent for the player on move if the tar-get is either undefended or more valuable (based on theother advisors) than the threatening piece. A threat ispotent for the non-moving player only if the attackeris less valuable than the target and the moving-playerdoes not already have a potent threat against the at-tacker.¯ global-threat : The two threat advisors above existin both local and global versions. The local version cred-its a player for each threat she has in a position, whilethe global version credits the player with only the mayimum of those local threat values.¯ possess : In a position where a player has a piece in-

2Recall from Section 1.3 that pieces in this class maymove and capture in different ways.

151

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 5: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

hand, tile player is awarded the dynamic value (usingthe local advisors) that the piece would receive, aver-aged over all empty board squares. Note that if themaximum value were used instead of the average, a pro-gram searching one-ply would never choose to place apiece on the bom’d once possessed.3

Goals and Step Functions The third group of ad-visors is concerned with goals and regressed goals forthis class of games.

¯ vital:this measures dynamic progress by both play-ers on their goals to eradicate some set of piece types.As a given goal becomes closer to achievement, expo-nentially more points are awarded to that player. Inaddition, if the munber of such remaining pieces is be-low some threshold, the remaining pieces are consideredvital, in which case any potential threats against themautomatically become potent.

¯ arrival-distance:this is a decreasing function ofthe abstract number of moves it would take a piece tomove (i.e. without capturing) fi’om its current squareto a goal destination on an otherwise empty board,where this abstract number is based on the minimumdistance to the destination plus the cost/difficulty ofclearing the path. This applies only to destinations forwhich the game has a defined arrival-goal for a piece.type consistent with the piece, and succeeds separatelyfor each way in which this is true.

¯ promote-distance : for each target-piece that a piececould promote into (by the player’s choice), this mea-sures the value of achieving this promotion (using theother advisors), but reduces this value based on thedifficulty of achieving this promotion, as discussed forarrival-distance above. The result is the mmximum netvalue (to a player) of the discounted promotion options,or the mininmm value if the opponent gets to promotethe piece instead.4

¯ init-promote:This applies in a position where aplayer is about to promote the opponent’s piece (onsome square) before making a normal move. For eachpromoting-option defined for that piece, this calculatesthe value that option would have on the given squarein the current position. The value retummd is the mmx-imum of the values of the options, from the perspectiveof the promoting player (and thus the minimum fromthe perspective of the player who owned the piece orig-inally).

Material Value The final group of advisors are usedfor assigning a fixed material value to each type of piece,which is later awarded to a player for each piece ofthat type he owns in a given position. This value is

3This analysis would have to be improved substantiallyto capture the complexities of possessed pieces in games likeShogi.

4 If we take the sum instead of the maximum here, a piecewith many promotion options could be more valuable thanits combined options, and thus it would never be desirableto promote it. For example, a pawn on the seventh rankwould sit there forever enjoying all its options, but nevercaching them in.

a weighted stun of the values returned by the advisorslisted in this section, and does not depend on the po-sition of the piece or of the other pieces on the board.Note that white king and black king are viewed as dif-ferent types of pieces during the computations below.5

¯ max-static-mob:The maximum static-mobility forthis piece over all board squares.¯ avg-static-mob : The average static-mobility for thispiece over all board squares.¯ max-eventual-mob : The mmximum eventual-mobilityfor this piece over all board squares.¯ avg-eventual-mob:The average eventual-mobilityfor this piece over all board squares.¯ eradicate : Awards 1 point for each opponent goal toeradicate pieces which match this type, and minus onepoint for each player goal to eradicate our own piecematching this type.¯ victims:Awards 1 point for e~ach type of piece thispiece has a power to capture (i.e. the number of piecesmatching one of its capture-types). A bonus is providedfor the effects of each capture, as discussed for the local-threat advisor above. It would be interesting to have adynamic version of this which gave preference to pieceswhich could capture other pieces actually present in agiven position.6

¯ immunity: Awards I point for each type of enemypiece that cannot capture this piece.¯ giveaway:Awards 1 point for each type of fi’iendlypiece that can capture this piece.¯ stalemate : This views the goal to stalemate a playeras if it were a goal to eradicate all of the player’s pieces,and performs the same computation as eradicate above.¯ arrive : Awards a piece 1 point if the player has anarrival goal predicated on that type of piece. It awards1/n points if the piece can promote to an arrival-goalpiece in n promotions. Values are negated for opponentarrival goals.7

¯ promote : This is computed in a separate pass after allthe other material values. It awards a piece a fi’action ofthe material value (computed so far) of each piece it canpromote into. This advisor is not fully implemented yet,and was not used in the work discussed in this paper.

Section 3 provides concrete examples of the applica-tion of these advisors to the rules different games dis-cussed in this thesis.

2.5 Static Analysis Tables

Most of these advisors draw on information whichwould be extremely slow to compute dynamically at

5Thus if a piece gets one point for each piece it can pos-sibly capture, and there are 5 distinct piece names, it ispossible to score 10 points if the piece can capture all pieces.

aA more sophisticated version of this feature, not fullyimplemented yet, takes into account the value of each vic-tim~ as determined by other static advisors.

7The net effect of this is that pieces wMch help a playerto achieve arrival goals receive positive points, and thosewhich help the opponent receive negative points.

152

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 6: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

each position to be evaluated. We overcome this prob-lem in a similar manner as with the automated ef-ficiency oI)timisation for the primitive search compo-nents, by having the program compile a set of tables af-ter it receives tile rules of each game, and then use thosetables thereafter. Another advantage of this game-specialisation beyond efficiency is that after receivingthe game rules, tile program can take advantage of therules to fold some of tlle goal tests directly into thetables.

As an example, one of the most fl’equently used ta-bles is tile Constrained-Transition Table. This ta-ble (used by all the static mobility advisors) recordsfor each piece and square, the other squares the piececould move to directly oll an otherwise empty board.However, it excludes all transitions from this set whichcan easily be shown either to be impossible or immedi-ately losing for the player moving that piece. A transi-tion is impossible if the starting square is one in whichan arrival goal would ah’eady have been achieved whenthe piece first arrived at tile square. A transition isimmediately losing if the player would necessarily losethe game whenever his piece arrives on the destinationsquare. While these considerations do not obviouslyapply to any known games, they prove highly relevantin generated games. Further details on static analysistables are provided in [Pell, 1993b].

2.6 Weights for Advisors

The last major issue concerning the construction of thestrategic evaluation flmction involves assigning weightsto each advisor, or more generally, developing a func-tion for mediation among advisors [Epstein, 1989a].While this issue is already difficult in the case of ex-isting games, it is correspondingly more difficult whenwe move to unkllown gaines, where we are not even as-sured of the presence of a strong opponent to learn from.However, by the construction of some of the advisors,we do have one significant constraint on their possiblevalues. For advisors which anticipate goal-achievement(such as promote-distance and the threat advisors),it would seem that their weight should always be atmost l. The reason is that the vahle they return issome fraction of the value which would be derived ifthe goal they anticipate were to be achieved. If such anadvisor were weighted double, for example, the value ofthe threat would be greater than tile anticipated valueof its execution, and the program would not in generalchoose to execute its threats.

Beyond the constraint on such advisors, this issueof weight assignment for Metagame is an open prob-lem. One idea for fllture research would be to ap-ply temporal-difference learning and self-play [Tesauro,1993] to this problem. It would be interesting to in-vestigate whether the "knowledge. free" approach whichwas so successful in learning backgammon also transfersto these different games, or whether it depends for itssuccess on properties specific to backgammon. In themeantime, we have been using METAGAMER with allweights set to 1.s

sit should be noted that this is not the same as setting

3 Examples of Material Analysis

One important aspect of METAGAMER’S game analysis,which was discussed in Section 2.4, is concerned withdetermining relative values for each type of piece in agiven game. This type of analysis is called materialanalysis, and the resulting values are called materialvalues or static piece values. This section illustrates theresults of METAGAMER’S material analysis when appliedto chess and checkers. In both cases, METAGAMER tookas input only the rules of the games.

In conducting this material analysis, METAGAMERused the material advisors shown in Table 1, all withequal weight of one point each.

Advisor Weightdynamic-mobility 1capture-mobility 1global-threat 1eventual-mobility 1promote-distance 1eradicate 1vital 1material 1victims 1max-static-mob 1max-eventual-mob 1eradicate 1stalematearrive

11

giveaway ]immunity 1

Table 1: Advisor weights for material analysis exam-ples.

3.1 Checkers

Table 2 lists material values determined by META-GAMER for the game of checkers, given only the encod-ing of the rules as presented in Figure 3. In the table,K stands for king, and M stands for man.

METAGAMER concludes that a king is worth almosttwo men. According to expert knowledge,9 this is agross underestimate of the value of a man. The rea-son that men are undervalued here is that METAGAMERdoes not yet consider the static value of a piece basedon its possibility to promote into other pieces (see Sec-tion 2.4). When actually playing a game, METAGAMERdoes take this into consideration, based on the dynamicpromote-distance advisor.

all piece values for a given game to equal value. The generalknowledge still imposes constraints on the relative values ofindividual pieces in a game. For example, even a randomsetting of weights will cause METAGAMER to value queensabove rooks [Pell, 1993b].

9t am thankful to Nick Flann for serving as a checkersexpert.

153

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 7: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

Material Analysis: checkers ]

Advisormax-static-mob 4 2max-eventual-mob 6.94 3.72avg-static-mob 3.06 1.53avg-eventual-mob 5.19 2.64eradicate 1 1victims 2 2immunity 0 0giveaway 0 0stalematearrive

1 10 0

]btal 11 23"2113"9]

Table 2: Material value analysis for checkers.

3.2 ChessTable 3 lists material values determined by META-GAMER for tile gmne of chess, given only an encodingof tile rules similar to that for checkers [Pell, 1993b]. Inthe table, the names of the pieces are just the first let-ters of the standard piece names, except that N refersto a knight.

[ Material Analysis: chess J

max-static 13 8 8 1 27 14max-eventual 12 12.9 14.8 1.99 23.5 20.2avg-static 8.75 6.56 5.25 0.875 22.8 14avg-eventual 10.9 9.65 11.8 1.75 22.4 20.2eradicate 0 1 0 0 0 0victims 6 6 6 6 6 6immunity 0 0 0 0 0 0giveaway 0 0 0 0 0 0stalemate 1 1 1 1 1 1arrive 0 0 0 0 0 0

IT°tal II 51-71 45.1 [46.9 I 12.6 [ 103 I 75.5]

Table 3: Material value analysis for chess.

As discussed for checkers above, pawns are here un-dervalued because METAGAMER does not consider theirpotential to promote into queens, rooks, bishops, orknights. According to its present analysis, a pawn hasincreasingly less eventual-mobility as it gets closer tothe promotion rank. Beyond this, the relative value ofthe pieces is surprisingly close to the values used in con-ventional chess programs, given that the analysis wasso simplistic.

3.3 Discussion

Despite its simplicity, the analysis produced useful piecevalues for a wide variety of games, which agree qualita-tively with the assessment of experts on some of these

games. This illustrates that the general knowledge en-coded in METAGAMER’S advisors and analysis methodsis an appropriate generalisation of game-specific knowl-edge. This appears to be the first instance of a game-playing program automatically deriving material valuesbased on active analysis when given only the rules ofdifferent games. It also appears to be the first instanceof a program capable of deriving useful piece values forgames unknown to the developer of the program [Pell,1993b]. The following sections compare METAGAMER toprevious work with respect to determination of featm’evahles.

Expected Outcome Abramson [Abramson, 1990]developed a technique for determining feature valuesbased on predicting the expected-outcome of a positionin which particular features (not only piece values) werepresent. The expected-outcome of a position is the frac-tion of games a player expects to win from a positionif the rest of the game after that position were playedrandomly. He suggested that this method was an indi-rect means of measuring the mobility afforded by cer-tain pieces. The method is statistical, computationallyintensive, and requires playing out many thousands ofgames. On the other hand, the analysis performed byMETAGAMER is a direct means of determining piece val-ues, which follows from the application of general prin-ciples to the rules of a game. It took METAGAMER underone minute to derive piece values for each of the gamesdiscussed in this section, and it conducted the analysiswithout playing out even a single contest.

Automatic Feature Generation There has re-cently been much progress in developing programswhich generate features automatically from the rules ofgames [de Grey, 1985; Callan and Utgoff, 1991; Fawcettand Utgoff, 1992]. When applied to chess such pro-grams produce features which count the number ofchess pieces of each type, and when applied to Oth-ello they produce features which measure different as-pects of positions which are correlated with mobility.The methods operate on any problems encoded in anextended logical representation, and are more generalthan the methods currently used by METAGAMER. How-ever, these methods do not generate the values of thesefeatures, and instead serve as input to systems whichmay learn theh" weights from experience or throughobservation of expert problem-solving. While META-GAMER’S analysis is specialised to the class of symmet-ric chess-like games, and thus less general than theseother methods, it produces piece values which are im-mediately useful, even for a program which does notperform any learning.

Evaluation Function Learning There has beenmuch work on learning feature values by experienceor observation (e.g. [Samuels, 1967; Lee and Mahajan,1988; Levinson and Snyder, 1991; Callan et al., 1991;Tunstall-Pedoe, 1991; van Tiggelen, 1991]). These areall examples of passive analysis [Pell, 1993b], and arenot of use to a program until it has had significant ex-perience with strong players.

154

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 8: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

Hoyle HOYLE [Epstein, 1989b] is a program, similarill spMt to METAGAMER, ill which general knowledgeis encapsulated using tile metaphor of advisors. HOYLEhas all advisor responsible for guiding the program intopositions ill which it has high mobility. However, HOYLEdoes not analyse the rules of the games it plays, and in-stead uses the naive notion of immediate-mobility asthe number of moves available to a player in a particu-lar position. Tile power of material values is that theyabstract away froin the mobility a piece has in a par-ticular position, and characterise the potential optionsand goals which are furthered by tile existence of eachtype of piece, whether or not these options are realisedin a palMcular position. As HOYLE does not performally analysis of the rules or construct analysis tablesas does METAGAMER, it is unable to benefit from thisimportant source of power.

4 Preliminary Results

Tile experimental evaluation of METAGAMER has onlyrecently begun, but the preliminary results are exciting.To assess the strength of the program on known games,we are comparing it against specialised game-playingprograms ill the games of chess and checkers.

4.1 CheckersWe are comparing tile performance of METAGAMER incheckers by playing it against Chinook [Schaeffer et al.,1991]. Chinook is the world’s strongest computer check-ers player, and the second strongest checkers player ingeneral. As it is a highly optimised and specialised pro-gram, it is not surprising that METACAMER always losesto it (on checkers, of course!). However, to get a baselinefor METAGAMER’S l)erformance relative to other possi-ble programs when playing against Chinook,m we haveevaluated the I)rograms when given various handicaps(number of men taken from Chinook at the start of thegame).

The preliminary results fi’om tile experiments arethat METAGAMER is around even to Chinook, whengiven a handicap of one man. This is compared to adeep-searching greedy material l)rogrmn which requiresa handicap of 4 men, and a random player, which re-quires a handicap of 8. In fact, in the 1-man handi-cap l)ositions, METAGAMER generally achieves what istechnically a winning position, but it is unable to winagainst Chinook’s defensive strategy of hiding in thedouble-, corner.

On observation of METACAMER’S play of checkers, itwas interesting to see that METAGAMER re-&scovered"the checkers strategy of not moving its back men un-til late in the game. It turned out that this strategyemerged fi’om the promote-distance advisor, operat-ing defencively instead of in its "intended" offensivefunction. Ill effect, METAGAMER realized fi’om moregeneral principles that by moving its back men, it madethe promotion square more accessible to the opponent,

1°In our experiments, Chinook played on its easiest level.It also played without access to its opening book or endgamedatabase, although it is unlikely that the experimental re-sults would have been much altered had it been using them.

thus increasing the opponent’s value, and decreasingits own. Of course, as discussed in Section 1.1, we can-not make an unbiased claim about originality of theprogram, as this game was known to METAGAMER’s de-siguer beforehand, but this development can still betaken as an indication of the generality of the advisorsacross a variety of games.

4.2 Chess

In chess, we are playing METAGAMER against Gnu-Chess, a ve~3r strong publicly available chess program,winner of the C Language division of the 1992 UniformPlatform Chess Competition. GnuChess is vastly supe-rior to METAGAMER (at chess, of course!), unless it handicapped severely in time and moderately in mate-rial.

The preliminary results from the experiments arethat METAGAMER is around even to GnuChess onits easiest level, H when given a handicap of oneknight. For purposes of comparison, a version of META-GAMER with only a standard hand-encoded materialevaluation function (queen---9, rook=5, bishop=3.25,knight=3, and pawn=l) [Botvinnik, 1970; Abramson,1990] played against METAGAMER with all its advisorsand against the version of GnuChess used above. Theresult was that the material program lost every gameat knight’s handicap against GnuChess, and lost everygame at even material against METAGAMER with all itsadvisors. This showed that METAGAMER’S performancewas not due to its search abilities, but rather to theknowledge in its evaluation function.

On observation of MIgTAGAMIgR’S play of chess, wehave seen the program develop its pieces quickly, placethem on active central squares, put pressure on enemypieces, make favourable exchanges while avoiding badones, and restrict the freedom of its opponent. In all, itis clear that METAGAMER’s knowledge gives it a reason-able positional sense and enables it to achieve longer-term strategic goals while searching only one or two-plydeep. This is actually quite impressive, given that noneof the knowledge encoded in METACAMER’S advisors orstatic analyser makes reference to any prope~Mes spe-cific to the game of chess--METAGAMER worked out itsown set of material values for each of the pieces (seeSection 3), and its own concept of the value of eachpiece on each square. On the other hand, the most ob-vious immediate limitation of METAGAMER revealed inthese games is a weakness in tactics caused in part byan inability to search more deeply within the time con-straints, in part by a lack of quiescence search, and alsoby the reliance on full-width tree-search. These are allimportant areas for future work.

4.3 New Games

The experiments above allow us to assess the appli-cation of the same general encoded knowledge to two

HIn the experiments, GnuChess played on level 1 withdepth 1. This means it searches 1-ply in general but canstill search deeply in quiescence search. METAGAMER playedwith one minute per move, and occasionally searched intothe second-ply.

155

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.

Page 9: From: AAAI Technical Report FS-93-02. Compilation ... · PDF filemetric chess-like game is a two-player game of perfect information, ill which the two players move pieces along l lnstead

different games. However, tile test of METAGAMER’Sability on tile task for which it was designed (i.e. SCL-Metagame) comes when we compare different programson a tournament of newly-generated games. The resultsof such experiments are encouraging [Pell, 19931)], andwill be reported in future work.

5 Acknowledgements

Thanks to Jonathan Schaeffer for permitting me touse Chinook and to Stuart Cracraft for contributingGmlChess to the research community. Thanks also toSiani Baker, Mark Humphrys, Rozella Oliver, MannyRayner, and Eike Ritter for careful reading of drafts ofthis work.

References

[Abramson, 19901 Bruce Abram-son. Expected-outcome: A general model of staticevaluation. IEEE Transactions on Pattern Analysisand Machine Intelligence, 12(2), February 1990.

[Botvinnik, 1970] M. M. Botvinnik. Computers, chessand long-range planning. Springer-Verlag New York,Inc., 1970.

[Callan and Utgoff, 1991] J. P. Callan and P.E. Utgoff.Constructive induction on domain knowledge. InProceedings of the Ninth National Conference on Ar-tificial Intelligence, pages 614-619, 1991.

[Callan et al., 1991] James P. Callan, Tom E. Fawcett,and Edwina L. Rissland. Adaptive case-based rea-soning. In Proceedings of the 12th International JointConference on Artificial Intelligence, 1991.

[Church and Church, 1979] Russell M. Church andKenneth W. Church. Plans, goals, and search strate-gies for the selection of a move in chess. In Pc-.ter W. Prey, editor, Chess Skill in Man and Machine.Springer-Verlag, 1979.

[de Grey, 1985] Aubrey de Grey. Towards a versatileself-learning board game program. Final Project, ~15-i-pos in Computer Science, University of Cambridge,1985.

[Epstein, 1989a] Susan Epstein. Mediation among Ad-visors. In P~vceedings on AI and Limited Rationality,pages 35-39. AAAI Spring Symposium Series, 1989.

[Epstein, 1989b] Susan Epstein. The Intelligent Novice- Learning to Play Better. In D.N.L. Levy and D.F.Beal, editors, Heu~’istic Programming in Artificial In-telligence - The First Computer Olympiad. Ellis Hor-wood, 1989.

[Fawcett and Utgoff, 1992] Tom E.Fawcett and Paul E. Utgoff. Automatic feature gen-eration for problem solving systems. In Derek Slee-.man and Peter Edwards, editors, Proceedings of theNinth Interntational Workshop on Machine Learn-ing, Aberdeen, Scotland, 1992.

[Flaun and Dietterich, 1989] Nicholas S. Flann andThomas G. Dietterich. A study of explanation-basedmethods for inductive learning. Machine Learning,4:187-226, 1989.

[Lee and Mahajan, 1988] Kai-Fu Lee and Sanjoy Ma-hajan. A pattern classification approach to evalua-tion function learning. Artificial Intelligence, 36:1-25, 1988.

[Levinson and Snyder, 1991] Robert A. Levinson andR. Snyder. Adaptive, pattern-oriented chess. In Pro-ceedings of the Ninth National Conference on Artifi-cial Intelligence, 1991.

[Levy and Newborn, 1991] David Levy and MontyNewborn. How Computers Play Chess. W.H. Free-man and Company, 1991.

[Pell, 1992a] Barney Pell. Metagame: A New Chal-lenge for Games and Learning. In H.J. van denHerik and L.V. Allis, editors, Heuristic Programmingin Artificial Intelligence 3 - The Third ComputerOlympiad. Ellis Horwood, 1992. Also appears as Uni-versity of Cambridge Computer Laboratory Techni-cal Report No. 276.

[Pell, 1992b] Barney Pell. Metagame in Symmetric,Chess-Like Games. In H.J. van den Herik and L.V.Allis, editors, Heuristic Programming in Artificial In-telligence 3 - The Third Computer Olympiad. EllisHorwood, 1992. Also appears as University of Cam-bridge Computer Laboratory Technical Report No.277.

[Pell, 1993a] Barney Pell. Logic Programming for Gen-eral Game Playing. In Proceedings of the ML93Workshop on Knowledge Compilation and SpeedupLearning, Amherst, Mass., June 1993. MachineLearning Conference. Also appears as University ofCambridge Computer Laboratory Technical ReportNo. 302.

[Pell, 1993b] Barney Pell. Strategy Generation andEvaluation for Meta Game-Playing. PhD the-sis, Computer Laboratory, University of Cambridge,1993. Forthcoming.

[Samuels, 1967] A. L. Samuels. Some studies in ma-chine learning using the game of Checkers. ii. IBMJournal, 11:601-617, 1967.

[Schaeffer et al., 1991] J. Schaeffer, J. Culber-son, N. Treloar, B. Knight, P. Lu, and D. Szafron.Reviving the game of checkers. In D.N.L. Levy andD.F. Beal, editors, Heuristic Programming in Artifi-cial Intelligence 2 - The Second Computer Olympiad.Ellis Horwood, 1991.

[Snyder, 1993] Richard Snyder. Distance: Toward theunification of chess knowledge. Master’s thesis, Uni-versity of California, Santa Cruz, March 1993.

[Tesauro, 1993] G. Tesauro. TD-Gammon, A Self-Te~aching Backgammon Program, Achieves Ivlaster-Level Play. Neural Computation, 1993. To Appear.

[Tunstall-Pedoe, 1991] William Tunstall-Pedoe. Ge-netic Algorithms Optimizing Evaluation Functions.ICCA-Journal, 14(3):119-128, September 1991.

[van Tiggelen, 1991] A. van Tiggelen. Neural Net-works as a Guide to Optimization. ICCA-JournaI,14(3):115-118, September 1991.

156

From: AAAI Technical Report FS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.