adversarial search (game)ce.sharif.edu/courses/97-98/2/ce417-2/resources/root... · 2019-03-07 ·...

Soleymani

Adversarial SearchCE417: Introduction to Artificial IntelligenceSharif University of TechnologySpring 2019

“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.

Outline} Game as a search problem} Minimax algorithm} 𝛼-𝛽 Pruning: ignoring a portion of the search tree} Time limit problem

} Cut off & Evaluation function

2

Games as search problems} Games

} Adversarial search problems (goals are in conflict)} Competitive multi-agent environments

} Games in AI are a specialized kind of games (in the gametheory)

3

Adversarial Games

4

} Many different kinds of games!

} Axes:} Deterministic or stochastic?} One, two, or more players?} Zero sum?} Perfect information (can you see the state)?

} Want algorithms for calculating a strategy (policy) which recommends amove from each state

Types of Games

5

Zero-Sum Games

} Zero-Sum Games} Agents have opposite utilities

(values on outcomes)} Lets us think of a single value that one

maximizes and the other minimizes

} Adversarial, pure competition

} General Games} Agents have independent utilities

(values on outcomes)} Cooperation, indifference, competition,

and more are all possible

} More later on non-zero-sum games

6

Primary assumptions} We start with these games:

} Two-player} Turn taking

} agents act alternately

} Zero-sum} agents’ goals are in conflict: sum of utility values at the end of the

game is zero or constant

} Perfect information} fully observable

7

Examples:Tic-tac-toe, chess, checkers

Deterministic Games} Many possible formalizations, one is:

} States: S (start at s0)} Players: P={1...N} (usually take turns)} Actions:A (may depend on player / state)} Transition Function: SxA ® S} Terminal Test: S ® {t,f}} Terminal Utilities: SxP ® R

} Solution for a player is a policy: S ®A

8

Single-Agent Trees

8

2 0 2 6 4 6… …

9

Value of a State

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome(utility)fromthatstate

10

Adversarial Search

11

Adversarial Game Trees

-20 -8 -18 -5 -10 +4… … -20 +8

12

Minimax Values

+8-10-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

13

Tic-Tac-Toe Game Tree

14

Game tree (tic-tac-toe)

15

} Two players:𝑃$ and 𝑃% (𝑃$ is now searching to find a good move)} Zero-sum games:𝑃$ gets 𝑈(𝑡),𝑃% gets 𝐶 − 𝑈(𝑡) for terminal node 𝑡

Utilities from the point of view of 𝑃$

1-ply = half move𝑃$

𝑃%

𝑃$

𝑃%

𝑃$: 𝑋𝑃%: 𝑂

Optimal play

16

} Opponent is assumed optimal} Minimax function is used to find the utility of each state.

} MAX/MIN wants to maximize/minimize the terminal payoff

MAX gets 𝑈(𝑡) for terminal node 𝑡

Adversarial Search (Minimax)

} Minimax search:} A state-space search tree} Players alternate turns} Compute each node’s minimax value:

the best achievable utility against arational (optimal) adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

17

Minimax

} 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) shows the best achievable outcome of being instate 𝑠 (assumption: optimal opponent)

18

𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠 = 5𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

𝑚𝑎𝑥D∈FGHIJKL M 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐴𝑋𝑚𝑖𝑛D∈FGHIJKL M 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐼𝑁

Utility of being in state s

3 2 2

3

Minimax (Cont.)

19

} Optimal strategy: move to the state with highest minimaxvalue} Best achievable payoff against best play

} Maximizes the worst-case outcome for MAX

} It works for zero-sum games

Minimax Properties

Optimal against a perfect player. Otherwise?

10 10 9 100

max

min

20

Minimax Implementation

defmin-value(state):initializev=+∞foreachsuccessorofstate:

v=min(v,max-value(successor))returnv

defmax-value(state):initializev=-∞foreachsuccessorofstate:

v=max(v,min-value(successor))returnv

21

Minimax Implementation (Dispatch)

def value(state):if the state is a terminal state: return the state’s utilityif the next agent is MAX: return max-value(state)if the next agent is MIN: return min-value(state)

defmin-value(state):initializev=+∞foreachsuccessorofstate:

v=min(v,value(successor))returnv

defmax-value(state):initializev=-∞foreachsuccessorofstate:

v=max(v,value(successor))returnv

22

Minimax algorithm

23

Depth first searchfunction 𝑀𝐼𝑁𝐼𝑀𝐴𝑋_𝐷𝐸𝐶𝐼𝑆𝐼𝑂𝑁(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛𝑎𝑐𝑡𝑖𝑜𝑛

return maxD∈FGHIJKL(MVDVW)

𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠𝑡𝑎𝑡𝑒, 𝑎))

function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣

function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← ∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣

Properties of minimax} Complete?Yes (when tree is finite)

} Optimal?Yes (against an optimal opponent)

} Time complexity:𝑂(𝑏𝑚)

} Space complexity:𝑂(𝑏𝑚) (depth-first exploration)

} For chess, 𝑏 ≈ 35,𝑚 > 50 for reasonable games} Finding exact solution is completely infeasible

24

Game Tree Pruning

25

Pruning

26

} Correct minimax decision without looking at every nodein the game tree} α-β pruning

} Branch & bound algorithm} Prunes away branches that cannot influence the final decision

α-β pruning example

27


28


29


30


31

α-β pruning

32

} Assuming depth-first generation of tree} We prune node 𝑛 when player has a better choice 𝑚 at

(parent or) any ancestor of 𝑛

} Two types of pruning (cuts):} pruning of max nodes (α-cuts)} pruning of min nodes (β-cuts)

Alpha-Beta Pruning

} General configuration (MIN version)} We’re computing the MIN-VALUE at some node n

} We’re looping over n’s children

} Who cares about n’s value? MAX

} Let a be the best value that MAX can get at anychoice point along the current path from the root

} If n becomes worse than a, MAX will avoid it, so wecan stop considering n’s other children (it’s alreadybad enough that it won’t be played)

} MAX version is symmetric

MAX

MIN

MAX

MIN

a

n

33

α-β pruning (an other example)

34

1 5

≤23

≥5

2

3

Why is it called α-β?} α: Value of the best (highest) choice found so far at any choice

point along the path for MAX} 𝛽: Value of the best (lowest) choice found so far at any choice

point along the path for MIN

} Updating α and 𝛽 during the search process

} For a MAX node once the value of this node is known to be more than the current 𝛽 (v ≥ 𝛽), its remaining branches are pruned.

} For a MIN node once the value of this node is known to be less than the current 𝛼 (v ≤ 𝛼), its remaining branches are pruned.

35

Alpha-Beta Implementation

defmin-value(state,α,β):initializev=+∞foreachsuccessorofstate:

v=min(v,value(successor,α,β))ifv≤α returnvβ=min(β,v)

returnv

defmax-value(state,α,β):initializev=-∞foreachsuccessorofstate:

v=max(v,value(successor,α,β))ifv≥β returnvα =max(α,v)

returnv

α: MAX’s best option on path to rootβ: MIN’s best option on path to root

36

37

function 𝐴𝐿𝑃𝐻𝐴_𝐵𝐸𝑇𝐴_𝑆𝐸𝐴𝑅𝐶𝐻(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛𝑎𝑐𝑡𝑖𝑜𝑛𝑣 ← 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, −∞,+∞)

return the 𝑎𝑐𝑡𝑖𝑜𝑛 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) with value 𝑣

function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≥ 𝛽 then return 𝑣𝛼 ← 𝑀𝐴𝑋(𝛼, 𝑣)

return 𝑣

function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← +∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≤ 𝛼 then return 𝑣𝛽 ← 𝑀𝐼𝑁(𝛽, 𝑣)

return 𝑣

α-β progress

38

Order of moves} Good move ordering improves effectiveness of pruning

} Best order: time complexity is 𝑂(𝑏l m⁄ )

39

?

Computational time limit (example)

40

} 100 secs is allowed for each move (game rule)} 104 nodes/sec (processor speed)} We can explore just 106 nodes for each move

} bm = 106, b=35 ⟹ m=4} (4-ply look-ahead is a hopeless chess player!)

} a-b reaches about depth 8 – decent chess program} It is still hopeless

Resource Limits

41

Resource Limits

} Problem: In realistic games, cannot searchto leaves!

} Solution: Depth-limited search} Instead, search only to a limited depth in the tree} Replace terminal utilities with an evaluation

function for non-terminal positions

} Guarantee of optimal play is gone

} More plies makes a BIG difference

} Use iterative deepening for an anytimealgorithm

? ? ? ?

-1 -2 4 9

4min

max

-2 4

42

Computational time limit: Solution

43

} We must make a decision even when finding the optimalmove is infeasible.

} Cut off the search and apply a heuristic evaluationfunction} cutoff test: turns non-terminal nodes into terminal leaves

} Cut off test instead of terminal test (e.g., depth limit)

} evaluation function: estimated desirability of a state} Heuristic function evaluation instead of utility function

} This approach does not guarantee optimality.

Depth Matters

} Evaluation functions are always imperfect

} The deeper in the tree the evaluationfunction is buried, the less the quality ofthe evaluation function matters

} An important example of the tradeoffbetween complexity of features andcomplexity of computation

44

Heuristic minimax

45

𝐻pIKIpFq M,r =

5𝐸𝑉𝐴𝐿(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝐶𝑈𝑇𝑂𝐹𝐹_𝑇𝐸𝑆𝑇(𝑠, 𝑑)

𝑚𝑎𝑥D∈FGHIJKL M 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX𝑚𝑖𝑛D∈FGHIJKL M 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN

Evaluation Functions

46

Evaluation functions

47

} For terminal states, it should order them in the same wayas the true utility function.

} For non-terminal states, it should be strongly correlatedwith the actual chances of winning.

} It must not need high computational cost.

Evaluation functions based on features

48

} Example: features for evaluation of the chess states} Number of each kind of piece: number of white pawns, black

pawns, white queens, black queens, etc} King safety} Good pawn structure} …

Evaluation Functions} Evaluation functions score non-terminals in depth-limited search

} Ideal function: returns the actual minimax value of the position} In practice: typically weighted linear sum of features:

} e.g. f1(s) = (num white queens – num black queens), etc.

49

Cutting off search: simple depth limit

50

} Problem1: non-quiescent positions} Few more plies make big difference in

evaluation value

} Problem 2: horizon effect} Delaying tactics against opponent’s move

that causes serious unavoidable damage(because of pushing the damage beyond thehorizon that the player can see, i.e. cut-offthreshold)

Speed up the search process

51

} Table lookup rather than search for some states} E.g., for the opening and ending of games (where there are few

choices)

} Example: Chess} For each opening, the best advice of human experts (from books

describing good plays) can be copied into tables.} For endgame, computer analysis is usually used (solving endgames by

computer).

Uncertain Outcomes

54

Probabilities

55

Reminder: Probabilities} A random variable represents an event whose outcome is unknown} A probability distribution is an assignment of weights to outcomes

} Example:Traffic on freeway} Random variable:T = whether there’s traffic} Outcomes:T in {none, light, heavy}} Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

} Some laws of probability (more later):} Probabilities are always non-negative} Probabilities over all possible outcomes sum to one

} As we get more evidence, probabilities may change:} P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60} We’ll talk about methods for reasoning and updating probabilities later

0.25

0.50

0.25

56

} The expected value of a function of a random variable is theaverage, weighted by the probability distribution overoutcomes

} Example: How long to get to the airport?

Reminder: Expectations

0.25 0.50 0.25Probability:

20min 30min 60minTime:35min

x x x+ +

57

Mixed Layer Types

} E.g. Backgammon} Expectiminimax

} Environment is an extra “random agent”player that moves after each min/max agent

} Each node computes the appropriatecombination of its children

58

Stochastic games: Backgammon

59

Stochastic games

60

} Expected utility: Chance nodes take average (expectation) overall possible outcomes.} It is consistent with the definition of rational agents trying to

maximize expected utility.

2 3 1 4

2.1 1.3 average of the valuesweighted by their probabilities

Stochastic games

61

𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) =𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

𝑚𝑎𝑥D∈FGHIJKL M 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX𝑚𝑖𝑛D∈FGHIJKL M 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN

{ 𝑃(𝑟)�

~𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑟 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = CHANCE

What Utilities to Use?

} For worst-case minimax reasoning, terminal function scale doesn’t matter} We just want better states to have higher evaluations (get the ordering right)} We call this insensitivity to monotonic transformations

} For average-case expectimax reasoning, we need magnitudes to be meaningful

0 40 20 30 x2 0 1600 400 900

62

Evaluation functions for stochastic games

63

} An order preserving transformation on leaf values is nota sufficient condition.

} Evaluation function must be a positive lineartransformation of the expected utility of a position.

Properties of search space for stochastic games

64

} 𝑂(𝑏�𝑛�)} Backgammon: 𝑏 ≈ 20 (can be up to 4000 for double dice rolls), 𝑛 = 21

(no. of different possible dice rolls)} 3-plies is manageable (≈ 10� nodes)

} Probability of reaching a given node decreases enormously byincreasing the depth (multiplying probabilities)} Forming detailed plans of actions may be pointless

} Limiting depth is not such damaging particularly when the probability values (foreach non-deterministic situation) are close to each other

} But pruning is not straightforward.

Example: Backgammon

} Dice rolls increase b: 21 possible rolls with 2 dice} Backgammon » 20 legal moves} Depth 2 = 20 x (21 x 20)3 = 1.2 x 109

} As depth increases, probability of reaching a givensearch node shrinks} So usefulness of search is diminished} So limiting depth is less damaging} But pruning is trickier…

} Historic AI: TDGammon uses depth-2 search + verygood evaluation function + reinforcement learning

Image:Wikipedia

65

1st AI world champion in any game!

Other Game Types

66

Multi-Agent Utilities

} What if the game is not zero-sum, or has multiple players?

} Generalization of minimax:} Terminals have utility tuples} Node values are also utility tuples} Each player maximizes its own component} Can give rise to cooperation and

competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

67

State-of-the-art game programs} Chess (𝑏 ≈ 35)

} In 1997, Deep Blue defeated Kasparov.} ran on a parallel computer doing alpha-beta search.} reaches depth 14 plies routinely.

¨ techniques to extend the effective search depth} Hydra: Reaches depth 18 plies using more heuristics.

} Checkers (𝑏 < 10)} Chinook (ran on a regular PC and uses alpha-beta search) ended 40-

year-reign of human world champion Tinsley in 1994.} Since 2007, Chinook has been able to play perfectly by using alpha-beta

search combined with a database of 39 trillion endgame positions.

68

State-of-the-art game programs (Cont.)

69

} Othello (𝑏 is usually between 5 to 15)} Logistello defeated the human world champion by

six games to none in 1997.} Human champions are no match for computers at

Othello.

} Go (𝑏 > 300)} Human champions refuse to compete against

computers (current programs are still at advancedamateur level).

} MOGO avoided alpha-beta search and used MonteCarlo rollouts.

} AlphaGo (2016) has beaten professionals withouthandicaps.

State-of-the-art game programs (Cont.)

70

} Backgammon (stochastic)} TD-Gammon (1992) was competitive with top

human players.} Depth 2 or 3 search along with a good evaluation

function developed by learning methods

} Bridge (partially observable, multiplayer)} In 1998, GIB was 12th in a filed of 35 in the par

contest at human world championship.} In 2005, Jack defeated three out of seven top

champions pairs. Overall, it lost by a small margin.

} Scrabble (partially observable & stochastic)} In 2006, Quackle defeated the former world

champion 3-2.

adversarial search (game)ce.sharif.edu/courses/97-98/2/ce417-2/resources/root... · 2019-03-07 ·...

Documents