solving problems by searching: adversarial searchab1544/fall2015/520/lecture5.pdfsolving problems by...

Course 16 :198 :520 : Introduction To Artificial IntelligenceLecture 5

Solving Problems by Searching:Adversarial Search

Abdeslam Boularias

Wednesday, September 16, 2015

1 / 24

Outline

We examine the problems that arise when we make decisions in a worldwhere other agents are also acting, possibly against us.

1 The minimax algorithm

2 Alpha-beta pruning

3 Imperfect fast decisions

4 Stochastic games

5 Partially observable games

2 / 24

Games

Multiagent environments are environments where more than oneagent is acting, simultaneously or at different times.

Contingency plans are necessary to account for the unpredictabilityof other agents.

Each agent has its own personal utility function.

The corresponding decision-making problem is called a game.

A game is competitive if the utilities of different agents aremaximized in different states.

In zero-sum games, the sum of the utilities of all agents is constant.

Zero-sum games are purely competitive.

3 / 24

Games

The abstract nature of games, such as chess, makes them appealingto study in AI.

The state of a game is easy to represent and agents typically have asmall number of actions to choose from.

(a) (b)

Left : Computer chess pioneers Herbert Simon and Allen Newell (1958). Right : John

McCarthy and the Kotok-McCarthy program on an IBM 7090 (1967)4 / 24

Games

The abstract nature of games, such as chess, makes them appealingto study in AI.

The state of a game is easy to represent and agents typically have asmall number of actions to choose from.

Physical games, such as soccer, are more difficult to study due totheir continuous state and action spaces.

Robot soccer(from ri.cmu.edu) 5 / 24

Games

A game is described by :

S0 : the initial state (how is the game set up at the start ?).

PLAYER(s) : Indicates which player has the move in state s (whoseturn is it ?).

ACTIONS(s) : Set of legal actions in state s.

RESULT(s, a) : returns the next state after we play action a in states.

TERMINAL-TEST(s) : Indicates if s is a terminal state.

UTILITY(s, p) : (also called objective or payoff function) defines anumerical value for a game that ends in terminal state s for player p

6 / 24

Example : tic-tac-toe

We suppose there are two players in a zero-sum game

We call our player MAX, she tries to maximize our utility.

We call our opponent MIN, she tries to minimize our utility (i.e.maximize her utility).

XX

XX

X

X

X

XX

X X

O

OX O

O

X OX O

X

. . . . . . . . . . . .

. . .

. . .

. . .

XX

–1 0 +1

XX

X XO

X XOX XO

O

O

X

X XO

OO

O O X X

MAX (X)

MIN (O)

MAX (X)

MIN (O)

TERMINAL

Utility 7 / 24

Optimal games

MAX A

B C D

3 12 8 2 4 6 14 5 2

3 2 2

3

a1a2

a3

b1

b2

b3 c1

c2

c3 d1

d2

d3

MIN

∆ Indicates states where MAX should play.

∇ Indicates states where MIN should play.

Which action among {a1, a2, a3} should MAX play ?8 / 24

Minimax strategy

MAX A

B C D

3 12 8 2 4 6 14 5 2

3 2 2

3

a1a2

a3

b1

b2

b3 c1

c2

c3 d1

d2

d3

MIN

We don’t take any risk, we assume that MIN will play optimally.

We look for the best action for the worst possible scenario.

What if our opponent is not optimal ? Can we learn the opponent’sbehaviour ?

What if our player is trying to fool us by behaving in a certain way ?9 / 24

Minimax strategy

MAX A

B C D

3 12 8 2 4 6 14 5 2

3 2 2

3

a1a2

a3

b1

b2

b3 c1

c2

c3 d1

d2

d3

MIN

MINIMAX(s) =

{UTILITY(s) if TERMINAL-TEST(s) = true,maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX,mina∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MIN.

10 / 24

Minimax algorithm

166 Chapter 5. Adversarial Search

function MINIMAX-DECISION(state) returns an actionreturn argmaxa ∈ ACTIONS(s) MIN-VALUE(RESULT(state,a))

function MAX-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v←−∞for each a in ACTIONS(state) dov←MAX(v , MIN-VALUE(RESULT(s , a)))

return v

function MIN-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v←∞for each a in ACTIONS(state) dov←MIN(v , MAX-VALUE(RESULT(s , a)))

return v

Figure 5.3 An algorithm for calculating minimax decisions. It returns the action corre-sponding to the best possible move, that is, the move that leads to the outcome with thebest utility, under the assumption that the opponent plays to minimize utility. The functionsMAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves,to determine the backed-up value of a state. The notation argmaxa∈ S f(a) computes theelement a of set S that has the maximum value of f(a).

to moveA

B

C

A(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)

(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)

(1, 2, 6) (1, 5, 2)

(1, 2, 6)

X

Figure 5.4 The first three plies of a game tree with three players (A, B, C). Each node islabeled with values from the viewpoint of each player. The best move is marked at the root.

vector of the successor state with the highest value for the player choosing at n. Anyonewho plays multiplayer games, such as Diplomacy, quickly becomes aware that much moreis going on than in two-player games. Multiplayer games usually involve alliances, whetherALLIANCE

formal or informal, among the players. Alliances are made and broken as the game proceeds.How are we to understand such behavior? Are alliances a natural consequence of optimalstrategies for each player in a multiplayer game? It turns out that they can be. For example,

11 / 24

Minimax strategy in multiplayer games

to moveA

B

C

A

(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)

(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)

(1, 2, 6) (1, 5, 2)

(1, 2, 6)

X

We have three players A, B, and C.

The utilities are represented by a 3-dimensional vector (vA, vB, vC).

We apply the same principle : assume that every player is optimal.

If the game is not zero-sum, implicit collaborations may occur.

12 / 24

Alpha-Beta pruning

Time is a major issue in game search trees. Searching the completetree takes O(bm) operations, where b is the branching factor and m isthe depth of the tree (the horizon).

Do we really need to parse the whole tree to find a minimax strategy ?

MAX A

B C D

3 12 8 2 4 6 14 5 2

3 2 2

3

a1a2

a3

b1

b2

b3 c1

c2

c3 d1

d2

d3

MIN

13 / 24

Alpha-Beta pruning

Example

Assume that the search tree has been parsed except for actions c2and c3.

Let us denote the utilities of c2 and c3 by x and y respectively.

MINIMAX(root) = max(min(3, 12, 8),min(2, x, y),min(14, 5, 2))

= max(3,min(2, x, y), 2)

= max(3, z, 2) where z = min(2, x, y) ≤ 2

= 3.

MAX A

B C D

3 12 8 2 4 6 14 5 2

3 2 2

3

a1a2

a3

b1

b2

b3 c1

c2

c3 d1

d2

d3

MIN

14 / 24

Alpha-Beta pruning

(a) (b)

(c) (d)

(e) (f)

3 3 12

3 12 8 3 12 8 2

3 12 8 2 14 3 12 8 2 14 5 2

A

B

A

B

A

B C D

A

B C D

A

B

A

B C

[−∞, +∞] [−∞, +∞]

[3, +∞][3, +∞]

[3, 3][3, 14]

[−∞, 2]

[−∞, 2] [2, 2]

[3, 3]

[3, 3][3, 3]

[3, 3]

[−∞, 3] [−∞, 3]

[−∞, 2] [−∞, 14]

15 / 24

Imperfect fast decisions

The minimax algorithm generates the entire search tree.

The alpha-beta algorithm allows us to prune large parts of the searchtree, but its complexity is still exponential in the branching factor(number of actions).

This is still non-practical because moves should be made very quickly.

Cutting-off the search

A cutoff test is used to decide when to stop looking further.

A heuristic evaluation function is used to estimate the utility wherethe search is cut off.

H-MINIMAX(s, d) =

{EVAL(s) if CUTOFF-TEST(s, d) = true,maxa∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MAX,

mina∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MIN.

16 / 24

Evaluation functions

Human chess players have ways of judging the value of a positionwithout imagining all the moves ahead until a check-mate.

A good evaluation function should order the actions correctlyaccording to their true utilities.

Evaluation functions should be computed very quickly.

17 / 24

Example : evaluation functions in chess

Evaluation functions use features of given positions in the game.

Example : number of pawns in the given position. If we know fromexperience that 72% of positions “two pawns vs one pawn” lead to awin (utility +1) ; 20% to a loss (0), and 8% to a draw (1/2), then theexpected value of these positions is 0.76.

Other functions such as the advantage in each piece, “good pawnstructure”, and “king safety” can be used as features fi.

The evaluation function can be given using a weighted linear model :

EVAL(s) =n∑

i=1

wifi(s),

where wi is the importance of feature fi.

18 / 24

Example : evaluation functions in chess

Linear models assume that the features are independent, which is notalways true (bishops are more efficient at endgame).

Values of some features do not increase linearly (two knights are waymore useful than one knight).

(b) White to move(a) White to move

In the two positions, the two players have the same number of pieces. The position on the right

is much worse than the one on the left for Black. What if the search cutoff happens in the left ?19 / 24

Stochastic games

In some games, the state of the game changes randomly dependingon the selected actions

Backgammon is a typical game that combines luck and skill.

1 2 3 4 5 6 7 8 9 10 11 12

24 23 22 21 20 19 18 17 16 15 14 13

0

25

20 / 24

Stochastic games

We can use the same minimax strategy, but we need to take therandomness into account by computing the expected utilities.

CHANCE

MIN

MAX

CHANCE

MAX

. . .

. . .

B

1

. . .

1,11/36

1,21/18

TERMINAL

1,21/18

......

.........

......

1,11/36

...

...... ......

...

C

. . .

1/186,5 6,6

1/36

1/186,5 6,6

1/36

2 –11–1

21 / 24

Stochastic games

We can use the same minimax strategy, but we need to take therandomness into account by computing the expected utilities.

EXPECMINIMAX(s) =

{UTILITY(s) if TERMINAL-TEST(s) = true,maxa∈Actions(s) EXPECMINIMAX(RESULT(s, a)) if PLAYER(s) = MAX,

mina∈Actions(s) EXPECMINIMAX(RESULT(s, a)) if PLAYER(s) = MIN∑r P (r)EXPECMINIMAX(RESULT(s, r)) if PLAYER(s) = CHANCE,

where r is a chance event (e.g, dice roll).

22 / 24

Trouble with evaluation functions in stochastic games

CHANCE

MIN

MAX

2 2 3 3 1 1 4 4

2 3 1 4

.9 .1 .9 .1

2.1 1.3

20 20 30 30 1 1 400 400

20 30 1 400

.9 .1 .9 .1

21 40.9

a1 a2 a1 a2

23 / 24

Partially observable

In some games, the state of the game is not fully known.

Cards and Kriegspiel are examples of such games.

The state of the game can be tracked by remembering past actionsand observations.

a

1

2

3

4

db c

Kc3 ?

“Illegal”“OK”

Rc3 ?

“OK” “Check”

24 / 24

solving problems by searching: adversarial searchab1544/fall2015/520/lecture5.pdfsolving problems by...

Documents