Adversarial Search Alpha Beta Pruning Heuristic
CS540 Introduction to Artificial IntelligenceLecture 19
Young WuBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles
Dyer
July 23, 2020
Adversarial Search Alpha Beta Pruning Heuristic
Lion Game ExampleMotivation
Adversarial Search Alpha Beta Pruning Heuristic
Nim Game ExampleMotivation
Adversarial Search Alpha Beta Pruning Heuristic
Game TreeMotivation
The initial state is the beginning of the game.
There are no goal states, but there are multiple terminalstates in which the game ends.
Each successor of a state represents a feasible action (or amove) in the game.
The search problem is to find the terminal state with thelowest cost (or usually the highest reward).
Adversarial Search Alpha Beta Pruning Heuristic
Adversarial SearchMotivation
The main difference between finding solutions of games andstandard search problems or local search problems is that partof the search is performed by an opponent adversarially.
Usually, the opponent wants to maximize the cost or minimizethe reward from the search. This type of search problems iscalled adversarial search.
In game theory, the solution of a game is called anequilibrium. It is a path in which both players do not want tochange actions.
Adversarial Search Alpha Beta Pruning Heuristic
Backward InductionMotivation
Games are usually solved backward starting from the terminalstates.
Each player chooses the best action (successor) given the(already solved) optimal actions of all players in the subtrees(called subgames).
Adversarial Search Alpha Beta Pruning Heuristic
Zero-Sum GamesMotivation
If the sum of the reward or cost over all players at eachterminal state is 0, the game is called a zero-sum game.
Usually, for games with one winner: the reward for winningand the cost of losing are both 1. If the game ends with a tie,both players get 0.
Adversarial Search Alpha Beta Pruning Heuristic
Tic Tac Toe ExampleMotivation
Adversarial Search Alpha Beta Pruning Heuristic
Nim Game ExampleMotivation
Ten objects. Pick 1 or 2 each time. Pick the last one to win.
A: Pick 1.
B: Pick 2.
C, D, E: Don’t choose.
Adversarial Search Alpha Beta Pruning Heuristic
Minimax AlgorithmDescription
Use DFS on the game tree.
Adversarial Search Alpha Beta Pruning Heuristic
Minimax AlgorithmAlgorithm
Input: a game tree pV ,E , cq, and the current state s.
Output: the value of the game at s.
If s is a terminal state, return c psq.
If the player is MAX, return the maximum value over allsuccessors.
α psq � maxs 1Ps 1psq
β�s 1�
If the player is MIN, return the minimum value over allsuccessors.
β psq � mins 1Ps 1psq
α�s 1�
Adversarial Search Alpha Beta Pruning Heuristic
BacktrackingDiscussion
The optimal actions (solution paths) can be found bybacktracking from all terminal states as in DFS.
s� psq � arg maxs 1Ps 1psq
β�s 1�
for MAX
s� psq � arg mins 1Ps 1psq
α�s 1�
for MIN
Adversarial Search Alpha Beta Pruning Heuristic
Minimax PerformanceDiscussion
The time and space complexity is the same as DFS. Note thatD � d is the maximum depth of the terminal states.
T � 1 � b � b2 � ...� bd
S � pb � 1q � d
Adversarial Search Alpha Beta Pruning Heuristic
Non-deterministic GameDiscussion
For non-deterministic games in which chance can make amove (dice roll or coin flip), use expected reward or costinstead.
The algorithm is also called expectiminimax.
Adversarial Search Alpha Beta Pruning Heuristic
PruningMotivation
Time complexity is a problem because the computer usuallyhas a limited amount of time to ”think” and make a move.
It is possible to reduce the time complexity by removing thebranches that will not lead the current player to win. It iscalled the Alpha-Beta pruning.
Adversarial Search Alpha Beta Pruning Heuristic
Alpha Beta PruningDescription
During DFS, keep track of both α and β for each vertex.
Prune the subtree with α ¥ β.
Adversarial Search Alpha Beta Pruning Heuristic
Alpha Beta Pruning Simple ExampleDefinition
Adversarial Search Alpha Beta Pruning Heuristic
Alpha Beta Pruning Algorithm, Part IAlgorithm
Input: a game tree pV ,E , cq, and the current state s.
Output: the value of the game at s.
If s is a terminal state, return c psq.
Adversarial Search Alpha Beta Pruning Heuristic
Alpha Beta Pruning Algorithm, Part IIAlgorithm
If the player is MAX, return the maximum value over allsuccessors.
α psq � maxs 1Ps 1psq
β�s 1�
β psq � β p parent psqq
Stop and return β if α ¥ β.
If the player is MIN, return the minimum value over allsuccessors.
β psq � mins 1Ps 1psq
α�s 1�
α psq � α p parent psqq
Stop and return α if α ¥ β.
Adversarial Search Alpha Beta Pruning Heuristic
Alpha Beta PerformanceDiscussion
In the best case, the best action of each player is the leftmostchild.
In the worst case, Alpha Beta is the same as minimax.
Adversarial Search Alpha Beta Pruning Heuristic
Static Evaluation FunctionDefinition
A static board evaluation function is a heuristics to estimatethe value of non-terminal states.
It should reflect the player’s chances of winning from thatvertex.
It should be easy to compute from the board configuration.
Adversarial Search Alpha Beta Pruning Heuristic
Evaluation Function PropertiesDefinition
If the SBE for one player is x , then the SBE for the otherplayer should be �x .
The SBE should agree with the cost or reward at terminalvertices.
Adversarial Search Alpha Beta Pruning Heuristic
Linear Evaluation Function ExampleDefinition
For Chess, an example of an evaluation function can be alinear combination of the following variables.
1 Material.
2 Mobility.
3 King safety.
4 Center control.
These are called the features of the board.
Adversarial Search Alpha Beta Pruning Heuristic
Iterative Deepening SearchDiscussion
IDS could be used with SBE.
In iteration d , the depth is limited to d , and the SBE of thenon-terminal vertices are used as their cost or reward.
Adversarial Search Alpha Beta Pruning Heuristic
Non Linear Evaluation FunctionDiscussion
The SBE can be estimated given the features using a neuralnetwork.
The features are constructed using domain knowledge, or apossibly a convolutional neural network.
The training data are obtained from games betweenprofessional players.
Adversarial Search Alpha Beta Pruning Heuristic
Monte Carlo Tree SearchDiscussion
Simulate random games by selecting random moves for bothplayers.
Exploitation by keeping track of average win rate for eachsuccessor from previous searches and picking the successorsthat lead to more wins.
Exploration by allowing random choices of unvisitedsuccessors.
Adversarial Search Alpha Beta Pruning Heuristic
Monte Carlo Tree Search DiagramDiscussion
Adversarial Search Alpha Beta Pruning Heuristic
Upper Confidence BoundDiscussion
Combine exploitation and exploration by picking successorsusing upper confidence bound for tree.
ws
ns� c
clog t
ns
ws is the number of wins after successor s, and ns the numberof simulations after successor s, and t is the total number ofsimulations.
Similar to the UCB algorithm for MAB.
Adversarial Search Alpha Beta Pruning Heuristic
Alpha GO ExampleDiscussion
MCTS with ¡ 105 play-outs.
Deep neural network to compute SBE.