poker and ai how the most “stable” creature on earth got used to that good old game from the...
TRANSCRIPT
Poker and AIHow the most “stable” creature on earth got used to that good old game from the west!
A game of (p)luck!• Cards:
– 2 Blinds– Flop : 3 community cards– Turn : 1 more community card– River : 1 last community card
• Betting rounds after every card deal/flip• Fold OR Call (Check) OR Raise (Bet)• Showdown, if you get there
Poker as a non trivial act of intelligence
Phil HellmuthPhil Hellmuth
Phil used my knowledge of Phil against me
Mike MatusowMike Matusow
Ain’t this an AI seminar?
• Games have always been an allure to AI theoreticians.
• Game of incomplete information• Several successful implementations:
BluffBot(Teppo Salonen), Polaris(Univ of Alberta), Poki, Casper… will see some.
• AAAI Annual Poker Competition : http://www.cs.ualberta.ca/~pokert/
The essence of Poker
• Hand Strength & Hand Potential : Assess the strength of the current hand.– Cards in game– Number of players in the game– Position of the player– History
– Draws– Risks
The essence of Poker
• Pot Odds– Pot odds are the relative odds of the bet v/s the
total pot compared with the odds of winning
– Example: If the cards in hand are A(H)-A(D). And the cards on board are A(C)-2-3-7-?. Then the odds of getting a very strong hand after the river are 5:13.
– The pot odds for a $10 bet on $40 pot are 1:4 while on a $10 pot are 1:1.
– The first favorable, not the second.
The essence of Poker
• Bluffing & Unpredictability– Different strategies in similar situations– Element of non determinacy
• Opponent Modeling– Used to guess the opponents’ cards based on
history
LOKI & POKIA look at how
The Experts do it!
Encoding the Problem
• Probability triples – simplicity itself
Pr := ( f , c , r )
“Marvin thinks for an eon and comes up with the three magic numbers to make tea!”
The output of all analysis at any game point is the probability with which poki folds or calls or raises. The final decision is non deterministic adding natural noise.
Building the system• Pre-flop strategies : Almost zero information
guess!• How do humans start: Sklansky’s rankings
– Collected into groups of similar cards (as far as poker is concerned) and categorized into 8 groups, of decreasing strength
– Tuned for 10 player games, not considering opponent characteristics
• A Rule based system on this information
Man as a hand-wavy standard
• Moving away from External information:– Eliminate the use of human knowledge
whenever possible– calculated information may be quantitative
rather than qualitative – The algorithmic approach can be applied to many
different specific situations (such as having exactly six players in the game)
Rebuilding the system
• Roll Out simulations– Pre-flop blinds called by all players and then
checks till the showdown. Then probability of winning with a pair of cards gives the Income rate
– Coarse• Iterated Roll Out simulations
– Income rates in the first simulation decides whether a player calls or folds pre-flops.
– This value stabilizes
Hand Strength Hand Potential
EffectiveHand Strength
Think!
ProbabilityTriple
RandomNumber
Generator
Hand Strength
• Hand Strength is the probability that a given hand is better than that of an active opponent– How? Calculate all possible hands that can be
made with the current hand, and also those that are better / equal / worse than ours
• Extrapolate to n-opponents by raising the found probability to n
HSn = (HS1)n
Hand Potential
• Positive Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is behind but ends up winning.
• Negative Potential: Of all possible games with the current hand, we calculate all scenarios where Poki is ahead but ends up losing.
Hand Strength Hand Potential
EffectiveHand Strength
ProbabilityTriple
RandomNumber
Generator
Effective Hand Strength
Pr(win)
= Pr(ahead)×Pr(opponent does not improve) + Pr(behind)×Pr(we improve)
= HS ×(1 − NPot) +(1 − HS)×PPot.
= HS + (1 − HS)×PPot.
= HSn + (1− HSn)×Ppot (multiple opponents)
Hand Strength Hand Potential
EffectiveHand Strength
ProbabilityTriple
RandomNumber
Generator
Adding Sophistication
• All card pairs at a given point of time not equally likely
• Maintain a weighting table that stores the probability for each card pair he/she may be holding at the given point in game depending on history.
• re-weighting : update to this table on every move.
EHSi = HSi + (1− HSi)×Ppot,i
“No poker strategy is complete without a good opponent modeling system”
A Neural Net trained for an opponent fed 19 game characteristics and outputs a probability triple of for the opponents next action.
Neural NetNeural Net
FoldCall
Bet
Inputs
There are other ways tomake money
CASe based Poker playER
• Stores a large case base obtained through the simulation of other bots (Loki/Poki)
• For a particular situation calculates similarity value for each case and sort them (quick sort)
• Take cases up to a threshold of 97% or top 20 (which ever applicable)
• Find probability (f, c, r) ,i.e., the frequency of various decisions taken in there cases.
CASe based Poker playER
• Performs well against other bots and against real opponents in play money games
• Testing in real money games was expensive!! Reasons given for this– Insufficient real money cases– Different strategy adopted by people
Evolving Adaptive Play
Loose Tight
Passive
Evolution startsAggressive
A particular human trait is represented by a matrx which stores informations like probability tuple in various game situations
Evolution
• Matrices corresponding to the new generation are formed by randomizing/swapping some values in the matrix.
• The most promising matrices are selected through multiple game plays.
• The final set of matrices correspond to the best solution in the current playing environment.
• Can adapt to any change in the strategy of other players
Evolution: Martians can’t exist on Earth
Wtight(Atight) > Wtight(A)Wloose(Aloose) > Wloose(A)
Wtight(Atight) > Wtight(Aloose)Wloose(Aloose) > Wloose(Atight)
Wx : Performance in ‘x’ environmentAy : Program developed in environment ‘y’
Human traits are generally not fixed and their domain is not so small
Stereotypes
• People play with certain “prejudiced” strategies. Extensive statistics collected to jot down possible stereotypes
• In an early game, lack of data hampers effective opponent modeling : use stereotypes
• Extend the idea to the whole game.
Stereotypes are various game-play styles adopted by various peoples recorded by watching a large number of games
A Façade used to match the decisions taken by the player at each betting round. The stereotype with the least mean square deviation chosen as the match
The actual stereotype then used to guess the action of the player in future
Poker and Game Theory
How to find the “optimal” strategy in the game of imperfect information – poker?
Applications of Game Theory
• To mathematically capture behavior in strategic situations, in which an individual's success in making choices depends on the choices of others
• In an equilibrium, each player of the game has adopted a strategy that they are unlikely to change, e.g. Nash Equilibrium applied to Climate Change Models
A One Card Poker
OPENEROPENER DEALERDEALER
ACE DEUCE TREY
How is the game played?
A One Card Poker
OPENEROPENER DEALERDEALER
1. Dealer Deals2. Put $ 100
2. Put $ 100
3. Check or Bet depending on how the other player plays!!
One card poker – decision tree
The tree goes to a maximum depth of 3
A One Card Poker – typical situation
OPENEROPENER DEALERDEALER
DEUCE
I Bet!!
What to do???Is he bluffing?
Assumption: Obvious Plays and Stupid Mistakes
1. Folding the trey (3)2. Calling with the ace3. Checking with the trey “in position”4. Betting with the deuce
Strategic Plays and Expected Value
Consider the following variables:
p1 = probability the opener bluffs with the ace,
p2 = probability the opener calls with the deuce,
p3 = probability the opener bets with the trey,
q1 = probability the dealer bluffs with the ace,
q2 = probability the dealer calls with the deuce.
Opener’s post-ante expected value
• There are three possible non-zero post-ante results for the opener. Either he loses $100, wins $200, or wins $300. We will begin by computing the probabilities of each of these outcomes.
Case 1: The opener has the ace, the dealer has the deuce P(-100 $) = p1q2, P(200 $) = p1(1 - q2), P(300 $) =
0Case 2: The opener has the ace, the dealer has the trey (3)
P(-100 $) = p1, P(200 $) = P(300 $) = 0
Opener’s post-ante expected value
Case 3: The opener has the deuce (2), the dealer has the aceP(-100 $) = 0, P(200 $) = 1 – q1, P(300 $) = q1p2
Case 4: The opener has the deuce (2), the dealer has the treyP(-100 $) = p2, P(200 $) = P(300 $) = 0
Case 5: The opener has the trey (3), the dealer has the aceP(-100 $) = 0, P(200 $) = 1 - (1 - p3)q1 , P(300 $) = (1
- p3)q1Case 6: The opener has the trey (3), the dealer has the deuce
P(-100 $) = 0, P(200 $) = 1 - p3q2 , P(300 $) = p3q2
Game Theoretic Analysis
The opener’s total Expected Value for the entire hand is:
[q1(3p2 − p3 − 1) + q2(p3 − 3p1) + (p1 − p2)] / 6
If q1 = q2 = 1/3; EV = - 1/18 and this does not depend on the opener’s choices of the numbers p1, p2, and p3
Optimal strategy: Game Theoretic Analysis
• The opener has an advantage in the game. The only way for the dealer to prevent the opener from being able to seize back some of this advantage is to play the indifferent strategy,
q1 = q2 = 1/3 • It is for this reason that the indifferent
strategy is more commonly referred to as the “optimal” strategy.
Game Theory – How to win?
You cannot win with the optimal strategy, but you can make sure you don’t lose.
Game Theory – How to win?
• So the object of the game is not to play optimally. It is to spot the times when your opponent is not playing optimally, or even to induce him not to play optimally, to recognize the way in which he is deviating from optimality, and then to choose a non-optimal strategy for yourself which capitalizes on his mistakes. You must play non-optimally in order to win. To capitalize on your opponent’s mistakes, you must play in a way that leaves you vulnerable.
Game Theory – to the other games
Perfect Information Imperfect Information
No chance ChessGo
Inspection GameBattleships
Chance BackgammonMonopoly
Poker
Interesting finds in Game Theoretical Poker Research:
•Gautam Rao, a poker expert said about PsOpti : You have a very strong program. Once you add opponent modeling to it, it will kill everyone
•In poker, knowing the basic approach of the opponent is essential, since it will dictate how to properly handle many situations that arise. Some players wrongly attributed intelligence where none was present
References• Billings, Davidson, Schaeffer, Szafron; The challenge of
poker, 2002• Billings, Davidson, Schaeffer, Szafron; Opponent modeling
in poker, 1998• Luigi Baron, Lyndon While; Evolving Adaptive play for
simplified poker, 1998• Watson and Rubin, Case Based Poker Bot, 2008• Layton, Vamplew, Turville; Using stereotypes to improve
early match poker play, 2008• Jason Swanson, Game Theory and Poker, 2005• D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T.
Schauenberg, and D. Szafron Approximating Game-Theoretic Optimal Strategies for Full-scale Poker