poker and ai how the most “stable” creature on earth got used to that good old game from the...

Poker and AIHow the most “stable” creature on earth got used to that good old game from the west!

A game of (p)luck!• Cards:

– 2 Blinds– Flop : 3 community cards– Turn : 1 more community card– River : 1 last community card

• Betting rounds after every card deal/flip• Fold OR Call (Check) OR Raise (Bet)• Showdown, if you get there

Poker as a non trivial act of intelligence

Phil HellmuthPhil Hellmuth

Phil used my knowledge of Phil against me

Mike MatusowMike Matusow

Ain’t this an AI seminar?

• Games have always been an allure to AI theoreticians.

• Game of incomplete information• Several successful implementations:

BluffBot(Teppo Salonen), Polaris(Univ of Alberta), Poki, Casper… will see some.

• AAAI Annual Poker Competition : http://www.cs.ualberta.ca/~pokert/

The essence of Poker

• Hand Strength & Hand Potential : Assess the strength of the current hand.– Cards in game– Number of players in the game– Position of the player– History

– Draws– Risks


• Pot Odds– Pot odds are the relative odds of the bet v/s the

total pot compared with the odds of winning

– Example: If the cards in hand are A(H)-A(D). And the cards on board are A(C)-2-3-7-?. Then the odds of getting a very strong hand after the river are 5:13.

– The pot odds for a $10 bet on $40 pot are 1:4 while on a $10 pot are 1:1.

– The first favorable, not the second.


• Bluffing & Unpredictability– Different strategies in similar situations– Element of non determinacy

• Opponent Modeling– Used to guess the opponents’ cards based on

history

LOKI & POKIA look at how

The Experts do it!

Encoding the Problem

• Probability triples – simplicity itself

Pr := ( f , c , r )

“Marvin thinks for an eon and comes up with the three magic numbers to make tea!”

The output of all analysis at any game point is the probability with which poki folds or calls or raises. The final decision is non deterministic adding natural noise.

Building the system• Pre-flop strategies : Almost zero information

guess!• How do humans start: Sklansky’s rankings

– Collected into groups of similar cards (as far as poker is concerned) and categorized into 8 groups, of decreasing strength

– Tuned for 10 player games, not considering opponent characteristics

• A Rule based system on this information

Man as a hand-wavy standard

• Moving away from External information:– Eliminate the use of human knowledge

whenever possible– calculated information may be quantitative

rather than qualitative – The algorithmic approach can be applied to many

different specific situations (such as having exactly six players in the game)

Rebuilding the system

• Roll Out simulations– Pre-flop blinds called by all players and then

checks till the showdown. Then probability of winning with a pair of cards gives the Income rate

– Coarse• Iterated Roll Out simulations

– Income rates in the first simulation decides whether a player calls or folds pre-flops.

– This value stabilizes

Hand Strength Hand Potential

EffectiveHand Strength

Think!

ProbabilityTriple

RandomNumber

Generator

Hand Strength

• Hand Strength is the probability that a given hand is better than that of an active opponent– How? Calculate all possible hands that can be

made with the current hand, and also those that are better / equal / worse than ours

• Extrapolate to n-opponents by raising the found probability to n

HSn = (HS1)n

Hand Potential

• Positive Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is behind but ends up winning.

• Negative Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is ahead but ends up losing.



ProbabilityTriple

RandomNumber

Generator

Effective Hand Strength

Pr(win)

= Pr(ahead)×Pr(opponent does not improve) + Pr(behind)×Pr(we improve)

= HS ×(1 − NPot) +(1 − HS)×PPot.

= HS + (1 − HS)×PPot.

= HSn + (1− HSn)×Ppot (multiple opponents)



ProbabilityTriple

RandomNumber

Generator

Adding Sophistication

• All card pairs at a given point of time not equally likely

• Maintain a weighting table that stores the probability for each card pair he/she may be holding at the given point in game depending on history.

• re-weighting : update to this table on every move.

EHSi = HSi + (1− HSi)×Ppot,i

“No poker strategy is complete without a good opponent modeling system”

A Neural Net trained for an opponent fed 19 game characteristics and outputs a probability triple of for the opponents next action.

Neural NetNeural Net

FoldCall

Bet

Inputs

There are other ways tomake money

CASe based Poker playER

• Stores a large case base obtained through the simulation of other bots (Loki/Poki)

• For a particular situation calculates similarity value for each case and sort them (quick sort)

• Take cases up to a threshold of 97% or top 20 (which ever applicable)

• Find probability (f, c, r) ,i.e., the frequency of various decisions taken in there cases.

CASe based Poker playER

• Performs well against other bots and against real opponents in play money games

• Testing in real money games was expensive!! Reasons given for this– Insufficient real money cases– Different strategy adopted by people

Evolving Adaptive Play

Loose Tight

Passive

Evolution startsAggressive

A particular human trait is represented by a matrx which stores informations like probability tuple in various game situations

Evolution

• Matrices corresponding to the new generation are formed by randomizing/swapping some values in the matrix.

• The most promising matrices are selected through multiple game plays.

• The final set of matrices correspond to the best solution in the current playing environment.

• Can adapt to any change in the strategy of other players

Evolution: Martians can’t exist on Earth

Wtight(Atight) > Wtight(A)Wloose(Aloose) > Wloose(A)

Wtight(Atight) > Wtight(Aloose)Wloose(Aloose) > Wloose(Atight)

Wx : Performance in ‘x’ environmentAy : Program developed in environment ‘y’

Human traits are generally not fixed and their domain is not so small

Stereotypes

• People play with certain “prejudiced” strategies. Extensive statistics collected to jot down possible stereotypes

• In an early game, lack of data hampers effective opponent modeling : use stereotypes

• Extend the idea to the whole game.

Stereotypes are various game-play styles adopted by various peoples recorded by watching a large number of games

A Façade used to match the decisions taken by the player at each betting round. The stereotype with the least mean square deviation chosen as the match

The actual stereotype then used to guess the action of the player in future

Poker and Game Theory

How to find the “optimal” strategy in the game of imperfect information – poker?

Applications of Game Theory

• To mathematically capture behavior in strategic situations, in which an individual's success in making choices depends on the choices of others

• In an equilibrium, each player of the game has adopted a strategy that they are unlikely to change, e.g. Nash Equilibrium applied to Climate Change Models

A One Card Poker

OPENEROPENER DEALERDEALER

ACE DEUCE TREY

How is the game played?

A One Card Poker


1. Dealer Deals2. Put $ 100

2. Put $ 100

3. Check or Bet depending on how the other player plays!!

One card poker – decision tree

The tree goes to a maximum depth of 3

A One Card Poker – typical situation


DEUCE

I Bet!!

What to do???Is he bluffing?

Assumption: Obvious Plays and Stupid Mistakes

1. Folding the trey (3)2. Calling with the ace3. Checking with the trey “in position”4. Betting with the deuce

Strategic Plays and Expected Value

Consider the following variables:

p1 = probability the opener bluffs with the ace,

p2 = probability the opener calls with the deuce,

p3 = probability the opener bets with the trey,

q1 = probability the dealer bluffs with the ace,

q2 = probability the dealer calls with the deuce.

Opener’s post-ante expected value

• There are three possible non-zero post-ante results for the opener. Either he loses $100, wins $200, or wins $300. We will begin by computing the probabilities of each of these outcomes.

Case 1: The opener has the ace, the dealer has the deuce P(-100 $) = p1q2, P(200 $) = p1(1 - q2), P(300 $) =

0Case 2: The opener has the ace, the dealer has the trey (3)

P(-100 $) = p1, P(200 $) = P(300 $) = 0

Opener’s post-ante expected value

Case 3: The opener has the deuce (2), the dealer has the aceP(-100 $) = 0, P(200 $) = 1 – q1, P(300 $) = q1p2

Case 4: The opener has the deuce (2), the dealer has the treyP(-100 $) = p2, P(200 $) = P(300 $) = 0

Case 5: The opener has the trey (3), the dealer has the aceP(-100 $) = 0, P(200 $) = 1 - (1 - p3)q1 , P(300 $) = (1

- p3)q1Case 6: The opener has the trey (3), the dealer has the deuce

P(-100 $) = 0, P(200 $) = 1 - p3q2 , P(300 $) = p3q2

Game Theoretic Analysis

The opener’s total Expected Value for the entire hand is:

[q1(3p2 − p3 − 1) + q2(p3 − 3p1) + (p1 − p2)] / 6

If q1 = q2 = 1/3; EV = - 1/18 and this does not depend on the opener’s choices of the numbers p1, p2, and p3

Optimal strategy: Game Theoretic Analysis

• The opener has an advantage in the game. The only way for the dealer to prevent the opener from being able to seize back some of this advantage is to play the indifferent strategy,

q1 = q2 = 1/3 • It is for this reason that the indifferent

strategy is more commonly referred to as the “optimal” strategy.

Game Theory – How to win?

You cannot win with the optimal strategy, but you can make sure you don’t lose.

Game Theory – How to win?

• So the object of the game is not to play optimally. It is to spot the times when your opponent is not playing optimally, or even to induce him not to play optimally, to recognize the way in which he is deviating from optimality, and then to choose a non-optimal strategy for yourself which capitalizes on his mistakes. You must play non-optimally in order to win. To capitalize on your opponent’s mistakes, you must play in a way that leaves you vulnerable.

Game Theory – to the other games

Perfect Information Imperfect Information

No chance ChessGo

Inspection GameBattleships

Chance BackgammonMonopoly

Poker

Interesting finds in Game Theoretical Poker Research:

•Gautam Rao, a poker expert said about PsOpti : You have a very strong program. Once you add opponent modeling to it, it will kill everyone

•In poker, knowing the basic approach of the opponent is essential, since it will dictate how to properly handle many situations that arise. Some players wrongly attributed intelligence where none was present

References• Billings, Davidson, Schaeffer, Szafron; The challenge of

poker, 2002• Billings, Davidson, Schaeffer, Szafron; Opponent modeling

in poker, 1998• Luigi Baron, Lyndon While; Evolving Adaptive play for

simplified poker, 1998• Watson and Rubin, Case Based Poker Bot, 2008• Layton, Vamplew, Turville; Using stereotypes to improve

early match poker play, 2008• Jason Swanson, Game Theory and Poker, 2005• D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T.

Schauenberg, and D. Szafron Approximating Game-Theoretic Optimal Strategies for Full-scale Poker

poker and ai how the most “stable” creature on earth got used to that good old game from the...

Documents

current hand

given hand

strong hand

opponents cards

hand strengthhand strength

pair of cards

groups of similar cards

handwavy standardmoving