sartre: system overview a case-based agent for two-player texas hold'em jonathan rubin &...

28
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group http://www.cs.auckland.ac.nz/ research/gameai/

Upload: vincent-parsons

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

SARTRE: System OverviewA Case-Based Agent for Two-Player Texas

Hold'em

Jonathan Rubin & Ian Watson

University of Auckland Game AI Grouphttp://www.cs.auckland.ac.nz/research/gameai/

Overview

• Introduction

• Texas Hold'em

• Approaches to Computer Poker

• Sartre: System Overview

• Results

• Future Work

Texas Hold'em

• Two-player Limit Hold'em

– Much different to full-table game

• Chance events

• Hidden Information

Approaches to Computer Poker

• Near-Equilibrium Strategy

• Exploitative Strategy

Near-Equilibrium Strategy

• Nash Equilibrium– Assumes the opponent makes no mistakes

– Attempts to minimise its loses against this perfect opponent

• Near-Equilibrium– As game tree is too large

– Plays not to lose

Exploitative Strategy

• Exploitative Strategy– Opponent Modelling

– Attempts to punish weaknesses in the opponents strategy

– Plays off the equilibrium

– Plays to win

Sartre: System Overview

• Similarity Assessment Reasoning for Texas hold'em via Recall of Experience

• Our entry for the 2009 Computer Poker Competition

• Case-base was constructed from past CPC games

Sartre: System Overview

• Hand picked by authors

• Case Features

– Previous betting for the hand

– Hand Category

– Board Category

1. Previous betting for the hand

• Currently represented as a string

– f = fold– c = check/call– r = bet/raise

• Examples

– r– rrc-r– rc-crrc-rc-cr

1. Previous betting for the hand

2. Hand Category

• Rule-based System

2. Hand Category

• Two components

– Hand Category– Hand Potential

• Examples

– Missed– One-Pair, Two-Pair, Three-of-a-kind

– Flush-draw, Straight-draw

3. Board Category

• Captures information about potential

– Flush Draws or,

– Straight Draws

• Information that is likely to be noticed by an good player

3. Board Category

• Flush Highly Possible

3. Board Category

• Straight Possible

Similarity

• Currently either all or nothing

– If a collection of cards maps to the same category they are assigned a similarity of 1.0, otherwise 0.

Case Overview

• Case Features– 1. Previous betting for the hand– 2. Hand Category– 3. Board Category

• Solution– f, c, r

• Outcome– +/- value– + Profit– - Loss

Case Overview

• Solution + Outcome– Recorded from equilibrium approaching

bots from previous AAAI Computer Poker Competition

• Separate case-bases for preflop, flop, turn & river

• Approx. 250,000 cases in each case-base.

Decision Making

• Retrieved cases can have different decisions

• Three different versions

– 1. Probability Triple

– 2. Majority rules

– 3. Outcome-based

Decision Making• Probability Triple

– Proportion of times that the solution indicated to fold, call or raise

– (f, c, r)• Majority Rules

– Decision made the most is reused• Outcome-Based

– Dependant on adjusted average outcome values for each decision

– If a call or raise decision was never made, it's outcome is unknown and is given a value of +infinity

Duplicate Matches

• Experimental results derived using duplicate matches

– Play N poker hands– Reset each players memory– Reverse the position of each player and

deal the same N hands

• Forward + Reverse Directions

• Reduces variance

Self-Play Experiments

• Small bets per hand (sb/h)– Assuming a $10/$20 game

• Sartre-Probability Vs. Sartre-Outcome– Sartre-Probability wins 0.168 sb/h– On average $1.68 profit per hand

• Sartre-Probability Vs. Sartre-Majority– Sartre-Majority wins 0.039 sb/h– On average $0.39 per hand

Self-Play Experiments

• Chose Sartre – Majority Rules.

• Results not transitive

• Makes Sartre more predictable and hence more exploitable by strong opposition

2009 Computer Poker Competition Results

• Duplicate match structure– 3000 hands in forward & reverse direction

• Multiple matches against each opponent until statistical significance obtained

• Sartre placed 7th out of 13 entrants in limit competition

2009 Computer Poker Competition Results

1 MANZANA -0.038

2 GGValuta -0.043

3 HyperboreanLimit-Eqm -0.051

4 HyperboreanLimit-BR -0.023

5 Rockhopper -0.033

6 Slumbot -0.012

7 Sartre

8 GS5 -0.007

9 AoBot 0.131

10 LIDIA 0.145

11 dcurbhu 0.217

12 GS5Dynamic 0.119

13 tommybot 0.765

Total 0.097

2009 Computer Poker Competition Results

• Overall profit of +0.097 sb/h

• Assuming a $10/$20 game

– $0.97 per hand profit

Future Work

• Investigate loosening of all-or-nothing similarity

• CBR and adaptive poker agents– Opponent modelling – Learning

• Better solution adaptation– Combination of decision + outcome

The End!