optimizing uct for settlers of catan - felipe · pdf fileoptimizing uct for settlers of catan...

Click here to load reader

Post on 12-Jul-2018

222 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Optimizing UCT for Settlers of CatanGabriel Rubin Bruno Paz Felipe Meneguzzi

    Pontifical Catholic University of Rio Grande do Sul, Computer Science Department, Brazil

    ABSTRACT

    Settlers of Catan is one of the main representatives of modern strate-gic board games and there are few autonomous agents available toplay it due to its challenging features such as stochasticity, imper-fect information, and 4-player structure. In this paper, we extendprevious work on UCT search to develop an automated player forSettlers of Catan. Specifically, we develop a move pruning heuris-tic for this game and introduce the ability to trade with the otherplayers using the UCT algorithm. We empirically compare our newplayer with a baseline agent for Settlers of Catan as well as the stateof the art and show that our algorithm generates superior strategieswhile taking fewer samples of the game.

    Keywords: Artificial Intelligence, Monte Carlo Tree Search, Set-tlers of Catan.

    1 INTRODUCTION

    Board games are of great interest to the Artificial Intelligence com-munity. The study of classical games such as Chess and Checkersmotivated great developments in the area, as many AI techniqueshave been developed to improve the performance of an AI in theseclassic games. While dealing well with traditional games, thesetechniques are often unsatisfactory for modern strategic games,commonly called Eurogames, because of the greater complexityof these games when compared to traditional board games [11].Newly developed techniques [12] have significantly improved theperformance of an AI in the classic Chinese game Go, bringingnew possibilities for the development of competitive agents for Eu-rogames.

    Settlers of Catan [10] is a good representative of the Eurogamearchetype, with gameplay elements that make it challenging for tra-ditional tree search algorithms, such as Minimax: imperfect infor-mation, randomly determined moves, more than 2 players and ne-gotiation between players. Most autonomous agent players avail-able for this game have game-specific heuristics and have a lowwin-rate against human players.

    Previous work showed that Upper Confidence Bounds for Trees(UCT) [9], a variant of Monte Carlo tree search prominently usedin games such as Go [7], yields a high win rate when applied to Set-tlers of Catan with simplified rules against agents from the JSettlersimplementation of the game [13]. JSettlers [14] is an open-sourceJava implementation of Settlers of Catan that includes implementa-tions of AI agents that are frequently used as benchmarks for newgame playing strategies [13, 8]. However, the strategies generatedby this previous UCT implementation do not negotiate with otherplayers [13] and was only tested on Settlers of Catan with simpli-fied rules [13]. Given the importance of trade as a gameplay ele-ment and the challenges of implementing effective players of thegame with unmodified rules, we aim to develop UTC-based strate-gies capable of overcoming these limitations and surpassing exist-ing techniques for playing Settlers of Catan.

    e-mail: gdearrud@gmail.com

    Thus this paper provides three main contributions. First, wemodify the base UCT algorithm to use domain knowledge and opti-mize it to improve its win rate in Settlers of Catan without relaxingthe game rules. Second, we develop a method for trading with otherplayers using UCT, by implementing a trade-optimistic search andcompare it to our solution using the base UCT algorithm with notrading. Finally, we also show how the agent can be improved byusing Ensemble UCT [6], a parallel variation of the base UCT al-gorithm that improves win rates and response time.

    2 BACKGROUNDIn Settlers of Catan, each player controls a group of settlers whointend to colonize an island. The game is a race for points: the firstplayer to obtain 10 victory points wins. To obtain victory points,players must gather resources and build their colonies on the is-land. In the section below, we explain the fundamental rules of thegame in some detail. For a more detailed explanation of rules, weencourage the reader to check the official rules [10].

    2.1 Game RulesThe game board, illustrated in Figure 1, represents the island andits ocean through hexagons. Every hexagon can be either one ofsix different types of terrain, or part of the ocean. Each terraintype produces its own type of resource: fields produce grain, hillsproduce brick, mountains produce ore, forest produces lumber, andpasture produces wool. There is one special terrain that dont pro-duce resources: the desert. Finally, on top of each terrain, there isa token with a number between 2 and 12, representing the possibleoutcomes out of 2 six-sided dice.

    Figure 1: The board of Settlers of Catan. This image include playersettlements, roads, cities and other elements.

    2.1.1 BuildingsThere are 3 types of buildings: settlements, cities and roads. Eachbuilding has a price in resources and can give players victory points:

  • roads cost 1 brick and 1 lumber and give no victory points; Settle-ments cost 1 brick, 1 lumber, 1 wool, and 1 grain, and are worth1 victory point; Cities cost 3 ores and 2 grains, and are worth 2victory points.

    Players can build settlements or cities on the intersection be-tween 3 terrain hexagons in order to obtain the resources producedby them. These resources can then be used to buy more buildings.Players can only place settlements and roads adjacent to anotherone of their roads, and cities can only be placed on top of one oftheir settlements.

    2.1.2 Resource production

    Resource production occurs in the beginning of each players turnby rolling the 2 six-sided dice. Resources are then produced basedon the outcome of the roll and the value depicted on top of theterrains on the board: any player with a settlement or city adjacentto a terrain with the same number as the dice roll, produces thatterrains resources, adding them to their hand. Settlements produce1 resource per dice roll and cities produce 2 resources.

    When a dice roll results in total of 7, all players that have morethan 7 resources in their hand must discard half of them and movethe robber. The robber is a special piece that blocks a terrain fromproducing during a dice roll. The robber starts the game at thedesert terrain. Once the player rolls 7 and moves the robber to aterrain, that player can steal a random resource from other playerwhose settlement or city is adjacent to the robbers terrain.

    2.1.3 Development cards and extra points

    Players can also buy a card from the deck of development cardswith resources. Each card costs 1 ore, 1 wool, and 1 grain. Thisdeck have 5 types of cards in it, each one of these has differenteffects on the game: Knight cards can be used to move the robber;Road Building cards can be used to place 2 roads on the board;Monopoly cards can steal all resources from a specific type fromall other players; Year of Plenty cards obtain any 2 resources; andVictory Point cards are worth 1 victory point at the end of the game.

    There are 2 achievements that give victory points to the playersduring the game: The player with the longest continuous road gets 2victory points, and the player with the largest army (largest numberof knights cards used) also gets 2 victory points. These achieve-ments are disputed during the match and cannot be shared betweentwo players.

    2.1.4 Trading

    Players can trade resources with the games bank or with other play-ers. Trade rates are 4 to 1 with the bank and negotiable with otherplayers. Players can only make trade offers during their turn. If aplayer decides to make no trade offer during its turn, then no otherplayer can trade. Players can react to a trade offer by accepting it,declining it, or making a counter-offer.

    There are ports in the game board that give players access tobetter trade rates with the bank. Players must place settlementsor cities adjacent to these ports access points in order to use theirtrade rates. Each port have 2 access points. Ports are divided in2 categories: generic ports have 3 to 1 rate for any resource, andspecial ports have a rate of 2 specific resources to 1. In the gameboard, there is a special port for each resource type and 4 genericports, totalizing 9 ports.

    2.2 Monte Carlo Tree Search

    Monte Carlo Tree Search (MCTS) [5] is a modern algorithm thatestimates the best move by iteratively constructing a search treewhose nodes represent the game states and edges represent the ac-tions that lead from one state to another. Each node in the tree holdsan estimated reward value Q(v) and visit count N(v).

    At each iteration, the algorithm executes 4 steps, represented inFigure 2 [3]: Selection, Expansion, Simulation, and Backpropaga-tion. The algorithm returns the estimated best move when a com-putational budget (i.e. time, iteration count or memory constraint)is reached.

    Figure 2: The 4 steps of the MCTS algorithm [3].

    The algorithm starts with the selection step, initially travers-ing the constructed search tree, using a tree policy T , thispolicy is dynamic and adapts as the tree grows. The algorithmtraverses the tree from the root node using the T policy toselect child nodes until it reaches a node with untried actions.

    The expansion step selects the node reached in the last stepand choose an untried action at random to create a new childnode.

    The simulation step uses a rollout policy R from the nodecreated in the last step to select actions until it reaches a ter-minal state.

    The backpropagation step propagates the rewards obtained inthe last step from the node created from the expansion step upto the root node of the search tree, updating the visited nodesQ(v) and N(v) values.

    2.3 Upper Confidence Bounds for TreesUCT is a variation of MCTS that uses UCB1 [1], a method thatsolves the multi-arme

View more