adversarial search (game)ce.sharif.edu/courses/97-98/2/ce417-2/resources/root... · 2019-03-07 ·...
TRANSCRIPT
Soleymani
Adversarial SearchCE417: Introduction to Artificial IntelligenceSharif University of TechnologySpring 2019
“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.
Outline} Game as a search problem} Minimax algorithm} 𝛼-𝛽 Pruning: ignoring a portion of the search tree} Time limit problem
} Cut off & Evaluation function
2
Games as search problems} Games
} Adversarial search problems (goals are in conflict)} Competitive multi-agent environments
} Games in AI are a specialized kind of games (in the gametheory)
3
Adversarial Games
4
} Many different kinds of games!
} Axes:} Deterministic or stochastic?} One, two, or more players?} Zero sum?} Perfect information (can you see the state)?
} Want algorithms for calculating a strategy (policy) which recommends amove from each state
Types of Games
5
Zero-Sum Games
} Zero-Sum Games} Agents have opposite utilities
(values on outcomes)} Lets us think of a single value that one
maximizes and the other minimizes
} Adversarial, pure competition
} General Games} Agents have independent utilities
(values on outcomes)} Cooperation, indifference, competition,
and more are all possible
} More later on non-zero-sum games
6
Primary assumptions} We start with these games:
} Two-player} Turn taking
} agents act alternately
} Zero-sum} agents’ goals are in conflict: sum of utility values at the end of the
game is zero or constant
} Perfect information} fully observable
7
Examples:Tic-tac-toe, chess, checkers
Deterministic Games} Many possible formalizations, one is:
} States: S (start at s0)} Players: P={1...N} (usually take turns)} Actions:A (may depend on player / state)} Transition Function: SxA ® S} Terminal Test: S ® {t,f}} Terminal Utilities: SxP ® R
} Solution for a player is a policy: S ®A
8
Single-Agent Trees
8
2 0 2 6 4 6… …
9
Value of a State
Non-TerminalStates:
8
2 0 2 6 4 6… … TerminalStates:
Valueofastate:Thebestachievableoutcome(utility)fromthatstate
10
Adversarial Search
11
Adversarial Game Trees
-20 -8 -18 -5 -10 +4… … -20 +8
12
Minimax Values
+8-10-5-8
StatesUnderAgent’sControl:
TerminalStates:
StatesUnderOpponent’sControl:
13
Tic-Tac-Toe Game Tree
14
Game tree (tic-tac-toe)
15
} Two players:𝑃$ and 𝑃% (𝑃$ is now searching to find a good move)} Zero-sum games:𝑃$ gets 𝑈(𝑡),𝑃% gets 𝐶 − 𝑈(𝑡) for terminal node 𝑡
Utilities from the point of view of 𝑃$
1-ply = half move𝑃$
𝑃%
𝑃$
𝑃%
𝑃$: 𝑋𝑃%: 𝑂
Optimal play
16
} Opponent is assumed optimal} Minimax function is used to find the utility of each state.
} MAX/MIN wants to maximize/minimize the terminal payoff
MAX gets 𝑈(𝑡) for terminal node 𝑡
Adversarial Search (Minimax)
} Minimax search:} A state-space search tree} Players alternate turns} Compute each node’s minimax value:
the best achievable utility against arational (optimal) adversary
8 2 5 6
max
min2 5
5
Terminalvalues:partofthegame
17
Minimax
} 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) shows the best achievable outcome of being instate 𝑠 (assumption: optimal opponent)
18
𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠 = 5𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)
𝑚𝑎𝑥D∈FGHIJKL M 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐴𝑋𝑚𝑖𝑛D∈FGHIJKL M 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐼𝑁
Utility of being in state s
3 2 2
3
Minimax (Cont.)
19
} Optimal strategy: move to the state with highest minimaxvalue} Best achievable payoff against best play
} Maximizes the worst-case outcome for MAX
} It works for zero-sum games
Minimax Properties
Optimal against a perfect player. Otherwise?
10 10 9 100
max
min
20
Minimax Implementation
defmin-value(state):initializev=+∞foreachsuccessorofstate:
v=min(v,max-value(successor))returnv
defmax-value(state):initializev=-∞foreachsuccessorofstate:
v=max(v,min-value(successor))returnv
21
Minimax Implementation (Dispatch)
def value(state):if the state is a terminal state: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)
defmin-value(state):initializev=+∞foreachsuccessorofstate:
v=min(v,value(successor))returnv
defmax-value(state):initializev=-∞foreachsuccessorofstate:
v=max(v,value(successor))returnv
22
Minimax algorithm
23
Depth first searchfunction 𝑀𝐼𝑁𝐼𝑀𝐴𝑋_𝐷𝐸𝐶𝐼𝑆𝐼𝑂𝑁(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛𝑎𝑐𝑡𝑖𝑜𝑛
return maxD∈FGHIJKL(MVDVW)
𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠𝑡𝑎𝑡𝑒, 𝑎))
function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do
𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣
function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← ∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do
𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣
Properties of minimax} Complete?Yes (when tree is finite)
} Optimal?Yes (against an optimal opponent)
} Time complexity:𝑂(𝑏𝑚)
} Space complexity:𝑂(𝑏𝑚) (depth-first exploration)
} For chess, 𝑏 ≈ 35,𝑚 > 50 for reasonable games} Finding exact solution is completely infeasible
24
Game Tree Pruning
25
Pruning
26
} Correct minimax decision without looking at every nodein the game tree} α-β pruning
} Branch & bound algorithm} Prunes away branches that cannot influence the final decision
α-β pruning example
27
α-β pruning example
28
α-β pruning example
29
α-β pruning example
30
α-β pruning example
31
α-β pruning
32
} Assuming depth-first generation of tree} We prune node 𝑛 when player has a better choice 𝑚 at
(parent or) any ancestor of 𝑛
} Two types of pruning (cuts):} pruning of max nodes (α-cuts)} pruning of min nodes (β-cuts)
Alpha-Beta Pruning
} General configuration (MIN version)} We’re computing the MIN-VALUE at some node n
} We’re looping over n’s children
} Who cares about n’s value? MAX
} Let a be the best value that MAX can get at anychoice point along the current path from the root
} If n becomes worse than a, MAX will avoid it, so wecan stop considering n’s other children (it’s alreadybad enough that it won’t be played)
} MAX version is symmetric
MAX
MIN
MAX
MIN
a
n
33
α-β pruning (an other example)
34
1 5
≤23
≥5
2
3
Why is it called α-β?} α: Value of the best (highest) choice found so far at any choice
point along the path for MAX} 𝛽: Value of the best (lowest) choice found so far at any choice
point along the path for MIN
} Updating α and 𝛽 during the search process
} For a MAX node once the value of this node is known to be more than the current 𝛽 (v ≥ 𝛽), its remaining branches are pruned.
} For a MIN node once the value of this node is known to be less than the current 𝛼 (v ≤ 𝛼), its remaining branches are pruned.
35
Alpha-Beta Implementation
defmin-value(state,α,β):initializev=+∞foreachsuccessorofstate:
v=min(v,value(successor,α,β))ifv≤α returnvβ=min(β,v)
returnv
defmax-value(state,α,β):initializev=-∞foreachsuccessorofstate:
v=max(v,value(successor,α,β))ifv≥β returnvα =max(α,v)
returnv
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
36
37
function 𝐴𝐿𝑃𝐻𝐴_𝐵𝐸𝑇𝐴_𝑆𝐸𝐴𝑅𝐶𝐻(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛𝑎𝑐𝑡𝑖𝑜𝑛𝑣 ← 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, −∞,+∞)
return the 𝑎𝑐𝑡𝑖𝑜𝑛 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) with value 𝑣
function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do
𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≥ 𝛽 then return 𝑣𝛼 ← 𝑀𝐴𝑋(𝛼, 𝑣)
return 𝑣
function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← +∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do
𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≤ 𝛼 then return 𝑣𝛽 ← 𝑀𝐼𝑁(𝛽, 𝑣)
return 𝑣
α-β progress
38
Order of moves} Good move ordering improves effectiveness of pruning
} Best order: time complexity is 𝑂(𝑏l m⁄ )
39
?
Computational time limit (example)
40
} 100 secs is allowed for each move (game rule)} 104 nodes/sec (processor speed)} We can explore just 106 nodes for each move
} bm = 106, b=35 ⟹ m=4} (4-ply look-ahead is a hopeless chess player!)
} a-b reaches about depth 8 – decent chess program} It is still hopeless
Resource Limits
41
Resource Limits
} Problem: In realistic games, cannot searchto leaves!
} Solution: Depth-limited search} Instead, search only to a limited depth in the tree} Replace terminal utilities with an evaluation
function for non-terminal positions
} Guarantee of optimal play is gone
} More plies makes a BIG difference
} Use iterative deepening for an anytimealgorithm
? ? ? ?
-1 -2 4 9
4min
max
-2 4
42
Computational time limit: Solution
43
} We must make a decision even when finding the optimalmove is infeasible.
} Cut off the search and apply a heuristic evaluationfunction} cutoff test: turns non-terminal nodes into terminal leaves
} Cut off test instead of terminal test (e.g., depth limit)
} evaluation function: estimated desirability of a state} Heuristic function evaluation instead of utility function
} This approach does not guarantee optimality.
Depth Matters
} Evaluation functions are always imperfect
} The deeper in the tree the evaluationfunction is buried, the less the quality ofthe evaluation function matters
} An important example of the tradeoffbetween complexity of features andcomplexity of computation
44
Heuristic minimax
45
𝐻pIKIpFq M,r =
5𝐸𝑉𝐴𝐿(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝐶𝑈𝑇𝑂𝐹𝐹_𝑇𝐸𝑆𝑇(𝑠, 𝑑)
𝑚𝑎𝑥D∈FGHIJKL M 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX𝑚𝑖𝑛D∈FGHIJKL M 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN
Evaluation Functions
46
Evaluation functions
47
} For terminal states, it should order them in the same wayas the true utility function.
} For non-terminal states, it should be strongly correlatedwith the actual chances of winning.
} It must not need high computational cost.
Evaluation functions based on features
48
} Example: features for evaluation of the chess states} Number of each kind of piece: number of white pawns, black
pawns, white queens, black queens, etc} King safety} Good pawn structure} …
Evaluation Functions} Evaluation functions score non-terminals in depth-limited search
} Ideal function: returns the actual minimax value of the position} In practice: typically weighted linear sum of features:
} e.g. f1(s) = (num white queens – num black queens), etc.
49
Cutting off search: simple depth limit
50
} Problem1: non-quiescent positions} Few more plies make big difference in
evaluation value
} Problem 2: horizon effect} Delaying tactics against opponent’s move
that causes serious unavoidable damage(because of pushing the damage beyond thehorizon that the player can see, i.e. cut-offthreshold)
Speed up the search process
51
} Table lookup rather than search for some states} E.g., for the opening and ending of games (where there are few
choices)
} Example: Chess} For each opening, the best advice of human experts (from books
describing good plays) can be copied into tables.} For endgame, computer analysis is usually used (solving endgames by
computer).
Uncertain Outcomes
54
Probabilities
55
Reminder: Probabilities} A random variable represents an event whose outcome is unknown} A probability distribution is an assignment of weights to outcomes
} Example:Traffic on freeway} Random variable:T = whether there’s traffic} Outcomes:T in {none, light, heavy}} Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
} Some laws of probability (more later):} Probabilities are always non-negative} Probabilities over all possible outcomes sum to one
} As we get more evidence, probabilities may change:} P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60} We’ll talk about methods for reasoning and updating probabilities later
0.25
0.50
0.25
56
} The expected value of a function of a random variable is theaverage, weighted by the probability distribution overoutcomes
} Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20min 30min 60minTime:35min
x x x+ +
57
Mixed Layer Types
} E.g. Backgammon} Expectiminimax
} Environment is an extra “random agent”player that moves after each min/max agent
} Each node computes the appropriatecombination of its children
58
Stochastic games: Backgammon
59
Stochastic games
60
} Expected utility: Chance nodes take average (expectation) overall possible outcomes.} It is consistent with the definition of rational agents trying to
maximize expected utility.
2 3 1 4
2.1 1.3 average of the valuesweighted by their probabilities
Stochastic games
61
𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) =𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)
𝑚𝑎𝑥D∈FGHIJKL M 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX𝑚𝑖𝑛D∈FGHIJKL M 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN
{ 𝑃(𝑟)�
~𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑟 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = CHANCE
What Utilities to Use?
} For worst-case minimax reasoning, terminal function scale doesn’t matter} We just want better states to have higher evaluations (get the ordering right)} We call this insensitivity to monotonic transformations
} For average-case expectimax reasoning, we need magnitudes to be meaningful
0 40 20 30 x2 0 1600 400 900
62
Evaluation functions for stochastic games
63
} An order preserving transformation on leaf values is nota sufficient condition.
} Evaluation function must be a positive lineartransformation of the expected utility of a position.
Properties of search space for stochastic games
64
} 𝑂(𝑏�𝑛�)} Backgammon: 𝑏 ≈ 20 (can be up to 4000 for double dice rolls), 𝑛 = 21
(no. of different possible dice rolls)} 3-plies is manageable (≈ 10� nodes)
} Probability of reaching a given node decreases enormously byincreasing the depth (multiplying probabilities)} Forming detailed plans of actions may be pointless
} Limiting depth is not such damaging particularly when the probability values (foreach non-deterministic situation) are close to each other
} But pruning is not straightforward.
Example: Backgammon
} Dice rolls increase b: 21 possible rolls with 2 dice} Backgammon » 20 legal moves} Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
} As depth increases, probability of reaching a givensearch node shrinks} So usefulness of search is diminished} So limiting depth is less damaging} But pruning is trickier…
} Historic AI: TDGammon uses depth-2 search + verygood evaluation function + reinforcement learning
Image:Wikipedia
65
1st AI world champion in any game!
Other Game Types
66
Multi-Agent Utilities
} What if the game is not zero-sum, or has multiple players?
} Generalization of minimax:} Terminals have utility tuples} Node values are also utility tuples} Each player maximizes its own component} Can give rise to cooperation and
competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
67
State-of-the-art game programs} Chess (𝑏 ≈ 35)
} In 1997, Deep Blue defeated Kasparov.} ran on a parallel computer doing alpha-beta search.} reaches depth 14 plies routinely.
¨ techniques to extend the effective search depth} Hydra: Reaches depth 18 plies using more heuristics.
} Checkers (𝑏 < 10)} Chinook (ran on a regular PC and uses alpha-beta search) ended 40-
year-reign of human world champion Tinsley in 1994.} Since 2007, Chinook has been able to play perfectly by using alpha-beta
search combined with a database of 39 trillion endgame positions.
68
State-of-the-art game programs (Cont.)
69
} Othello (𝑏 is usually between 5 to 15)} Logistello defeated the human world champion by
six games to none in 1997.} Human champions are no match for computers at
Othello.
} Go (𝑏 > 300)} Human champions refuse to compete against
computers (current programs are still at advancedamateur level).
} MOGO avoided alpha-beta search and used MonteCarlo rollouts.
} AlphaGo (2016) has beaten professionals withouthandicaps.
State-of-the-art game programs (Cont.)
70
} Backgammon (stochastic)} TD-Gammon (1992) was competitive with top
human players.} Depth 2 or 3 search along with a good evaluation
function developed by learning methods
} Bridge (partially observable, multiplayer)} In 1998, GIB was 12th in a filed of 35 in the par
contest at human world championship.} In 2005, Jack defeated three out of seven top
champions pairs. Overall, it lost by a small margin.
} Scrabble (partially observable & stochastic)} In 2006, Quackle defeated the former world
champion 3-2.