Mapello: Othello on Game Maps Some initial explorations
Monte Carlo Tree Search:Insights and ApplicationsBCS Real AI EventSimon LucasGame Intelligence GroupUniversity of Essex
OutlineGeneral machine intelligence: the ingredientsMonte Carlo Tree SearchA quick overview and tutorialExample application: MapelloNote: Game AI is Real AI !!!Example test problem: Physical TSPResults of open competitionsChallenges and future directionsGeneral Machine Intelligence: the ingredientsEvolutionReinforcement LearningFunction approximationNeural nets, N-Tuples etcSelective search / Sample based planning / Monte Carlo Tree Search
Darwin, Pavlov, and Skinner3Conventional Game Tree SearchMinimax with alpha-beta pruning, transposition tablesWorks well when:A good heuristic value function is knownThe branching factor is modestE.g. Chess: Deep Blue, RybkaSuper-human on a smartphone!Tree grows exponentially with search depth
GoMuch tougher for computersHigh branching factorNo good heuristic value function
MCTS to the rescue!
Although progress has been steady, it will take many decades of research and development before world-championshipcalibre go programs exist. Jonathan Schaeffer, 2001Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT)
Further reading:
Attractive FeaturesAnytimeScalableTackle complex games and planning problems better than beforeMay be logarithmically better with increased CPUNo need for heuristic functionThough usually better with oneNext well look at:General MCTSUCT in particularNote: this slide may be unnecessary might be covered earlier7MCTS: the main ideaTree policy: choose which node to expand (not necessarily leaf of tree)Default (simulation) policy: random playout until end of game
MCTS AlgorithmDecompose into 6 parts:MCTS main algorithmTree policy ExpandBest Child (UCT Formula)Default PolicyBack-propagateWell run through these then show demosMCTS Main AlgorithmBestChild simply picks best child node of root according to some criteria: e.g. best mean valueIn our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be usedE.g. final selection can be the max value child or the most frequently visited one
Check this point about BestChild10TreePolicyNote that node selected for expansion does not need to be a leaf of the treeBut it must have at least one untried action
Expand
Best Child (UCT)This is the standard UCT equationUsed in the treeHigher values of c lead to more explorationOther terms can be added, and usually areMore on this later
DefaultPolicyEach time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reachedThe standard is to do this uniformly randomlyBut better performance may be obtained by biasing with knowledge
BackupNote that v is the new node added to the tree by the tree policyBack up the values from the added node up the tree to the root
MCTS Builds Asymmetric Trees (demo)
All Moves As First (AMAF),Rapid Value Action Estimates (RAVE)Additional term in UCT equation:Treat actions / moves the same independently of where they occur in the move sequence
Using for a new problem:Implement the State interface
Example Application: MapelloOthelloEach move you must Pincer one or more opponent counters between the one you place and an existing one of your colourPincered counters are flipped to your own colourWinner is player with most pieces at the end
Basics of Good Game DesignSimple rulesBalanceSense of dramaOutcome should not be obvious
Othello Example white leads: -58(from http://radagast.se/othello/Help/strategy.html )
Black wins with score of 16
MapelloTake the counter-flipping drama of OthelloApply it to novel situationsObstaclesPower-ups (e.g. triple square score)Large maps with power-plays e.g. line fillNovel gamesAllow users to design maps that they are expert inThe map design is part of the gameResearch bonus: large set of games to experiment withExample Initial Maps
Or how about this?
Need Rapidly Smart AIGive players a challenging gameEven when the game map can be new each timeObvious easy to apply approachesTD Learning Monte Carlo Tree Search (MCTSCombinations of these E.g. Silver et al, ICML 2008Robles et al, CIG 2011MCTS (see Browne et al, TCIAIG 2012)Simple algorithmAnytimeNo need for a heuristic value functionE-E balanceWorks well across a range of problems
DemoTDL learns reasonable weights rapidlyHow well will this play at 1 ply versus limited toll-out MCTS?
For Strong Play Combine MCTS, TDL, N-Tuples
Where to play / buyComing to Android (November 2012)Nestorgames (http://www.nestorgames.com)
MCTS in Real-Time Games: PTSPHard to get long-term planning without good heuristics
Optimal TSP order != PTSP Order36
MCTS: Challenges and Future DirectionsBetter handling of problems with continuous action spacesSome work already done on thisBetter understanding of handling real-time problemsUse of approximations and macro-actionsStochastic and partially observable problems / games of incomplete and imperfect informationHybridisation: with evolution with other tree search algorithmsConclusionsMCTS: a major new approach to AIWorks well across a range of problemsGood performance even with vanilla UCTBest performance requires tuning and heuristicsSometimes the UCT formula is modified or discardedCan be used in conjunction with RLSelf tuningAnd with evolutionE.g. evolving macro-actionsFurther reading and linkshttp://ptsp-game.net/http://www.pacman-vs-ghosts.net/