15281 sp20 lecture 4 adversarial search
TRANSCRIPT
![Page 1: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/1.jpg)
AnnouncementsAssignments:§ P1: Search and Games
§ Due Thu 2/6, 10 pm§ Recommended to work in pairs§ Submit to Gradescope early and often
§ HW2 (written) – Search and Heuristics§ Due tomorrow Tue 1/28, 10 pm
§ HW3 (online) – Adversarial Search and CSPs§ Out tomorrow§ Due Tue 2/4, 10 pm
![Page 2: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/2.jpg)
Creating Admissible HeuristicsMost of the work in solving hard search problems optimally is in coming up with admissible heuristics
Often, admissible heuristics are solutions to relaxed problems, where new actions are available
15366
![Page 3: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/3.jpg)
Combining heuristicsDominance: ha ≥ hc if
"n ha(n) ³ hc(n)§ Roughly speaking, larger is better as long as both are admissible§ The zero heuristic is pretty bad (what does A* do with h=0?)§ The exact heuristic is pretty good, but usually too expensive!
What if we have two heuristics, neither dominates the other?§ Form a new heuristic by taking the max of both:
h(n) = max( ha(n), hb(n) )§ Max of admissible heuristics is admissible and dominates both!
![Page 4: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/4.jpg)
AI: Representation and Problem SolvingAdversarial Search
Instructors: Pat Virtue & Stephanie RosenthalSlide credits: CMU AI, http://ai.berkeley.edu
![Page 5: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/5.jpg)
Outline
History / Overview
Zero-Sum Games (Minimax)
Evaluation Functions
Search Efficiency (α-β Pruning)
Games of Chance (Expectimax)
![Page 6: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/6.jpg)
Game Playing State-of-the-ArtCheckers:§ 1950: First computer player. § 1959: Samuel’s self-taught program. § 1994: First computer world champion: Chinook ended 40-year-reign
of human champion Marion Tinsley using complete 8-piece endgame.
§ 2007: Checkers solved! Endgame database of 39 trillion states
Chess:§ 1945-1960: Zuse, Wiener, Shannon, Turing, Newell & Simon,
McCarthy. § 1960s onward: gradual improvement under “standard model”§ 1997: special-purpose chess machine Deep Blue defeats human
champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second and extended some lines of search up to 40 ply. Current programs running on a PC rate > 3200 (vs 2870 for Magnus Carlsen).
Go:§ 1968: Zobrist’s program plays legal Go, barely (b>300!)§ 2005-2014: Monte Carlo tree search enables rapid advances: current
programs beat strong amateurs, and professionals with a 3-4 stone handicap.
![Page 7: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/7.jpg)
Game Playing State-of-the-ArtCheckers:§ 1950: First computer player. § 1959: Samuel’s self-taught program. § 1994: First computer world champion: Chinook ended 40-year-reign
of human champion Marion Tinsley using complete 8-piece endgame.
§ 2007: Checkers solved! Endgame database of 39 trillion states
Chess:§ 1945-1960: Zuse, Wiener, Shannon, Turing, Newell & Simon,
McCarthy. § 1960s onward: gradual improvement under “standard model”§ 1997: special-purpose chess machine Deep Blue defeats human
champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second and extended some lines of search up to 40 ply. Current programs running on a PC rate > 3200 (vs 2870 for Magnus Carlsen).
Go:§ 1968: Zobrist’s program plays legal Go, barely (b>300!)§ 2005-2014: Monte Carlo tree search enables rapid advances: current
programs beat strong amateurs, and professionals with a 3-4 stone handicap.
§ 2015: AlphaGo from DeepMind beats Lee Sedol
![Page 8: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/8.jpg)
Behavior from Computation
[Demo: mystery pacman (L6D1)]
![Page 9: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/9.jpg)
Many different kinds of games!
Axes:§ Deterministic or stochastic?§ Perfect information (fully observable)?§ One, two, or more players?§ Turn-taking or simultaneous?§ Zero sum?
Want algorithms for calculating a contingent plan (a.k.a. strategy or policy)which recommends a move for every possible eventuality
Types of Games
![Page 10: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/10.jpg)
“Standard” Games
Standard games are deterministic, observable, two-player, turn-taking, zero-sumGame formulation:§ Initial state: s0
§ Players: Player(s) indicates whose move it is§ Actions: Actions(s) for player on move§ Transition model: Result(s,a)§ Terminal test: Terminal-Test(s)§ Terminal values: Utility(s,p) for player p§ Or just Utility(s) for player making the decision at root
![Page 11: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/11.jpg)
Zero-Sum Games
• Zero-Sum Games• Agents have opposite utilities • Pure competition:
• One maximizes, the other minimizes
• General Games• Agents have independent utilities• Cooperation, indifference, competition,
shifting alliances, and more are all possible
![Page 12: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/12.jpg)
Adversarial Search
![Page 13: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/13.jpg)
Single-Agent Trees
8
2 0 2 6 4 6… …
![Page 14: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/14.jpg)
Minimax
StatesActionsValues
![Page 15: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/15.jpg)
Minimax
+8-10-5-8
StatesActionsValues
![Page 16: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/16.jpg)
Piazza Poll 1
12 8 5 23 2 144 6
What is the minimax value at the root?A) 2B) 3C) 6D) 12E) 14
![Page 17: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/17.jpg)
Piazza Poll 1
12 8 5 23 2 144 6
3 2 2
3
What is the minimax value at the root?A) 2B) 3C) 6D) 12E) 14
![Page 18: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/18.jpg)
Minimax Code
![Page 19: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/19.jpg)
Max Code
+8-10-8
![Page 20: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/20.jpg)
Max Code
![Page 21: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/21.jpg)
Minimax Code
![Page 22: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/22.jpg)
Minimax Notation
𝑉 𝑠 = max'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
![Page 23: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/23.jpg)
𝑎 = argmax'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
5𝑎 = argmax'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Minimax Notation
𝑉 𝑠 = max'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
![Page 24: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/24.jpg)
Generic Game Tree Pseudocodefunction minimax_decision( state )
return argmax a in state.actions value( state.result(a) )
function value( state )if state.is_leaf
return state.value
if state.player is MAXreturn max a in state.actions value( state.result(a) )
if state.player is MINreturn min a in state.actions value( state.result(a) )
![Page 25: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/25.jpg)
Minimax Efficiency
How efficient is minimax?§ Just like (exhaustive) DFS§ Time: O(bm)§ Space: O(bm)
Example: For chess, b » 35, m » 100§ Exact solution is completely infeasible§ Humans can’t do this either, so how do
we play chess?§ Bounded rationality – Herbert Simon
![Page 26: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/26.jpg)
Resource Limits
![Page 27: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/27.jpg)
Resource Limits
Problem: In realistic games, cannot search to leaves!
Solution 1: Bounded lookahead§ Search only to a preset depth limit or horizon§ Use an evaluation function for non-terminal positions
Guarantee of optimal play is gone
More plies make a BIG difference
Example:§ Suppose we have 100 seconds, can explore 10K nodes / sec§ So can check 1M nodes per move§ For chess, b=~35 so reaches about depth 4 – not so good ? ? ? ?
-1 -2 4 9
4
min
max
-2 4
![Page 28: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/28.jpg)
Depth Matters
Evaluation functions are always imperfectDeeper search => better play (usually)Or, deeper search gives same quality of play with a less accurate evaluation functionAn important example of the tradeoff between complexity of features and complexity of computation
[Demo: depth limited (L6D4, L6D5)]
![Page 29: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/29.jpg)
Demo Limited Depth (2)
![Page 30: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/30.jpg)
Demo Limited Depth (10)
![Page 31: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/31.jpg)
Evaluation Functions
![Page 32: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/32.jpg)
Evaluation FunctionsEvaluation functions score non-terminals in depth-limited search
Ideal function: returns the actual minimax value of the positionIn practice: typically weighted linear sum of features:§ EVAL(s) = w1 f1(s) + w2 f2(s) + …. + wn fn(s)§ E.g., w1 = 9, f1(s) = (num white queens – num black queens), etc.
![Page 33: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/33.jpg)
Evaluation for Pacman
![Page 34: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/34.jpg)
Generalized minimax
What if the game is not zero-sum, or has multiple players?
Generalization of minimax:§ Terminals have utility tuples§ Node values are also utility tuples§ Each player maximizes its own component§ Can give rise to cooperation and
competition dynamically…
1,1,6 0,0,7 9,9,0 8,8,1 9,9,0 7,7,2 0,0,8 0,0,7
0,0,7 8,8,1 7,7,2 0,0,8
8,8,1 7,7,2
8,8,1
![Page 35: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/35.jpg)
Bias in Evaluation Functions and Heuristics
Bias is the phenomenon of systematically analyzing results with faulty assumptions
Evaluation functions and heuristics are human-generated.They are potentially subject to the same biases as people.
![Page 36: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/36.jpg)
Bias in Evaluation Functions and Heuristics
Discussion – Road navigation with A*What heuristics could we generate that may be biased?What should we consider to reduce those biases?
Discussion – International negotiations (e.g., climate deals)In bounded lookahead, how could one nation’s evaluation function be biased and affect the outcome of the negotiation?What are some strategies for potentially reducing those biases?
![Page 37: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/37.jpg)
Game Tree Pruning
![Page 38: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/38.jpg)
Minimax Example
12 8 5 23 2 144 6
3 2 2
3
![Page 39: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/39.jpg)
Alpha-Beta Example
12 8 5 23 2 14
α =3 α =3
α = best option so far from any MAX node on this path
The order of generation matters: more pruningis possible if good moves come first
3
3
![Page 40: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/40.jpg)
Piazza Poll 2Which branches are pruned?(Left to right traversal)(Select all that apply)
![Page 41: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/41.jpg)
Piazza Poll 3Which branches are pruned?(Left to right traversal)A) e, lB) g, lC) g, k, lD) g, n
![Page 42: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/42.jpg)
1
Alpha-Beta Quiz 2
?
10
?
?
10
10 100
?
?
2
2
?
β =
α =
α= α= α=
β =
![Page 43: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/43.jpg)
Alpha-Beta Implementation
def min-value(state , α, β):initialize v = +∞for each successor of state:
v = min(v, value(successor, α, β))if v ≤ α
return vβ = min(β, v)
return v
def max-value(state, α, β):initialize v = -∞for each successor of state:
v = max(v, value(successor, α, β))if v ≥ β
return vα = max(α, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 44: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/44.jpg)
Alpha-Beta Quiz 2
10 v=100
β = 10
def max-value(state, α, β):initialize v = -∞for each successor of state:
v = max(v, value(successor, α, β))if v ≥ β
return vα = max(α, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 45: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/45.jpg)
Alpha-Beta Quiz 2
10
10 100 2
v = 2
α = 10def min-value(state , α, β):
initialize v = +∞for each successor of state:
v = min(v, value(successor, α, β))if v ≤ α
return vβ = min(β, v)
return v
α: MAX’s best option on path to rootβ: MIN’s best option on path to root
![Page 46: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/46.jpg)
Alpha-Beta Pruning PropertiesTheorem: This pruning has no effect on minimax value computed for the root!
Good child ordering improves effectiveness of pruning§ Iterative deepening helps with this
With “perfect ordering”:§ Time complexity drops to O(bm/2)§ Doubles solvable depth!§ Chess: 1M nodes/move => depth=8, respectable
This is a simple example of metareasoning (computing about what to compute)
10 10 0
max
min
![Page 47: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/47.jpg)
Modeling Assumptions
Know your opponent – what happens if the other isn’t playing optimally?
10091010
![Page 48: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/48.jpg)
Modeling Assumptions
Dangerous OptimismAssuming chance when the world is adversarial
Dangerous PessimismAssuming the worst case when it’s not likely
![Page 49: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/49.jpg)
Modeling Assumptions
Chance nodes: Expectimax
10091010
![Page 50: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/50.jpg)
Why Expectimax?
Pretty great model for an agent in the worldChoose the action that has the: highest expected value
![Page 51: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/51.jpg)
Chance outcomes in trees
10 10 9 10010 10 9 100
9 10 9 1010 100
Tictactoe, chessMinimax
Tetris, investingExpectimax
Backgammon, MonopolyExpectiminimax
![Page 52: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/52.jpg)
Probabilities
![Page 53: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/53.jpg)
Probabilities
A random variable represents an event whose outcome is unknown
A probability distribution is an assignment of weights to outcomes
Example: Traffic on freeway§ Random variable: T = whether there’s traffic§ Outcomes: T in {none, light, heavy}§ Distribution:
P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
Probabilities over all possible outcomes sum to one
0.25
0.50
0.25
![Page 54: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/54.jpg)
Expected value of a function of a random variable:Average the values of each outcome, weighted by the probability of that outcome
Example: How long to get to the airport?
Expected Value
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:35 minx x x+ +
![Page 55: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/55.jpg)
Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x+ +
6020 30
𝑉 𝑠 = max'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Max node notation Chance node notation
𝑉 𝑠 =
0.25
0.5
0.25
![Page 56: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/56.jpg)
Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x+ +
6020 30
0.25
0.5
0.25
𝑉 𝑠 = max'
𝑉 𝑠′ ,
where 𝑠) = 𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎)
Max node notation Chance node notation
𝑉 𝑠 =67)
𝑃 𝑠) 𝑉(𝑠′)
![Page 57: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/57.jpg)
Piazza Poll 4
Expectimax tree search:Which action do we choose?
A: LeftB: CenterC: RightD: Eight
412 8 8 6 12 6
1/41/4
1/2 1/2 1/2 1/3 2/3
LeftCenter
Right
![Page 58: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/58.jpg)
Piazza Poll 4
Expectimax tree search:Which action do we choose?
A: LeftB: CenterC: RightD: Eight
412 8 8 6 12 6
1/41/4
1/2 1/2 1/2 1/3 2/3
LeftCenter
Right
4+3=73+2+2=7 4+4=8
8, Right
![Page 59: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/59.jpg)
Expectimax Pruning?
12 93 2
No! Why?
![Page 60: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/60.jpg)
Expectimax Codefunction value( state )
if state.is_leafreturn state.value
if state.player is MAXreturn max a in state.actions value( state.result(a) )
if state.player is MINreturn min a in state.actions value( state.result(a) )
if state.player is CHANCEreturn sum s in state.next_states P( s ) * value( s )
![Page 61: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/61.jpg)
SummaryGames require decisions when optimality is impossible§ Bounded-depth search and approximate evaluation functions
Games force efficient use of computation§ Alpha-beta pruning
Game playing has produced important research ideas§ Reinforcement learning (checkers)§ Iterative deepening (chess)§ Rational metareasoning (Othello)§ Monte Carlo tree search (Go)§ Solution methods for partial-information games in economics (poker)
Video games present much greater challenges – lots to do!§ b = 10500, |S| = 104000, m = 10,000
![Page 62: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/62.jpg)
Bonus QuestionLet’s say you know that your opponent is actually running a depth 1 minimax, using the result 80% of the time, and moving randomly otherwiseQuestion: What tree search should you use?
A: MinimaxB: ExpectimaxC: Something completely different
![Page 63: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/63.jpg)
𝑉 𝑠 = max'
67)
𝑃(𝑠)) 𝑉(𝑠))
Preview: MDP/Reinforcement Learning Notation
![Page 64: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/64.jpg)
Preview: MDP/Reinforcement Learning Notation
Standard expectimax: 𝑉 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎)𝑉(𝑠))
𝑉 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉 𝑠)
𝑉<=> 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉< 𝑠) , ∀ 𝑠
𝑄<=> 𝑠, 𝑎 = 67)
𝑃 𝑠) 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠) + 𝛾 max'B
𝑄<(𝑠), 𝑎))] , ∀ 𝑠, 𝑎
𝜋E 𝑠 = argmax'
67)
𝑃 𝑠) 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉 𝑠) ] , ∀ 𝑠
𝑉<=>F 𝑠 = 6
7)
𝑃 𝑠) 𝑠, 𝜋 𝑠 [𝑅 𝑠, 𝜋 𝑠 , 𝑠) + 𝛾𝑉<F 𝑠) ] , ∀ 𝑠
𝜋GHI 𝑠 = argmax'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉FJKL 𝑠) , ∀ 𝑠
Bellman equations:
Value iteration:
Q-iteration:
Policy extraction:
Policy evaluation:
Policy improvement:
![Page 65: 15281 Sp20 Lecture 4 Adversarial Search](https://reader031.vdocuments.mx/reader031/viewer/2022011815/61d50c01c010aa59f25d3d25/html5/thumbnails/65.jpg)
Preview: MDP/Reinforcement Learning Notation
Standard expectimax: 𝑉 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎)𝑉(𝑠))
𝑉 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉 𝑠)
𝑉<=> 𝑠 = max'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉< 𝑠) , ∀ 𝑠
𝑄<=> 𝑠, 𝑎 = 67)
𝑃 𝑠) 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠) + 𝛾 max'B
𝑄<(𝑠), 𝑎))] , ∀ 𝑠, 𝑎
𝜋E 𝑠 = argmax'
67)
𝑃 𝑠) 𝑠, 𝑎 [𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉 𝑠) ] , ∀ 𝑠
𝑉<=>F 𝑠 = 6
7)
𝑃 𝑠) 𝑠, 𝜋 𝑠 [𝑅 𝑠, 𝜋 𝑠 , 𝑠) + 𝛾𝑉<F 𝑠) ] , ∀ 𝑠
𝜋GHI 𝑠 = argmax'
67)
𝑃 𝑠) 𝑠, 𝑎 𝑅 𝑠, 𝑎, 𝑠) + 𝛾𝑉FJKL 𝑠) , ∀ 𝑠
Bellman equations:
Value iteration:
Q-iteration:
Policy extraction:
Policy evaluation:
Policy improvement: