learning opponent-type probabilities for prom search

19
August 20 2001 6th Computer Olympiad 1 Learning Opponent-type Learning Opponent-type Probabilities for PrOM Probabilities for PrOM search search Jeroen Donkers IKAT Universiteit Maastricht

Upload: cyrus-livingston

Post on 30-Dec-2015

26 views

Category:

Documents


3 download

DESCRIPTION

Learning Opponent-type Probabilities for PrOM search. Jeroen Donkers IKAT Universiteit Maastricht. Contents. OM search and PrOM search Learning for PrOM search Off-line Learning On-line Learning Conclusions & Future research. OM search. MAX player uses evaluation function V 0 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 1

Learning Opponent-type Learning Opponent-type Probabilities for PrOM searchProbabilities for PrOM search

Jeroen Donkers

IKAT Universiteit Maastricht

Page 2: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 2

ContentsContents

• OM search and PrOM search

• Learning for PrOM search

• Off-line Learning

• On-line Learning

• Conclusions & Future research

Page 3: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 3

OM searchOM search

– MAX player uses evaluation function V0

– Opponent uses different evaluation function (Vop)

– At MIN nodes: predict which move the opponent will select (using standard search and Vop)

– At MAX nodes, pick the move that maximizes the search value (based on V0)

– At leaf nodes: use V0

Page 4: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 4

PrOM searchPrOM search

• Extended Opponent Model:– a set of opponent types (e.g. evaluation functions)– a probability distribution over this set

• Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type.

Page 5: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 5

PrOM search algorithmPrOM search algorithm

• At MIN nodes: determine for every opponent type which move would be selected.

• Compute the MAX player’s value for these moves

• Use opponent-type probabilities to compute the expected value of the MIN node

• at MAX nodes: select maximum child

Page 6: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 6

Learning in PrOM searchLearning in PrOM search

• How do we assess the probabilities on the opponent types?– Off line: use games previously played by the

opponent, to estimate the probabilities. (lot of time and - possibly - data available)

– On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed.

Page 7: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 7

Off-Line LearningOff-Line Learning

• Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent.

• Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best.

Page 8: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 8

Off-Line LearningOff-Line Learning

• How to obtain P*(opp)?• Input: a set of positions and the moves that

the given opponent and all the given opponent types would select

• “Algorithm”: P*(oppi) = Ni / N• But: leave out all ambiguous positions!

(e.g. when more than one opponent type agree with the opponent)

Page 9: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 9

Off-Line LearningOff-Line Learning

• Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types– Effective learning is possible (P*(opp) P# (opp))– More difficult if the opponent types are not

independent

Page 10: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 10

5 opponent typesP = (a,b,b,b,b)20 moves 100 - 100,000 runs100 samples

Not leaving outambiguous events

Page 11: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 11

5 opponent typesP = (a,b,b,b,b)20 moves 10 - 100,000 runs100 samples

Leaving outambiguous events

Page 12: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 12

2-20 opponent typesP = (a,b,b,b,b)20 moves 100,000 runs100 samples

Varying number of opponent types

Page 13: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 13

Off-Line LearningOff-Line Learning

• Case 2: The opponent is using a different strategy.– Opponent types behave random but dependent

(distribution of type i depends on type i-1)– Real opponent selects a fixed move

Page 14: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 14

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 2 4 6 8 10 12 14 16 18

Opponent's selection

opp4

opp3

opp2

opp1

opp0

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14 16 18

Opponent's selection

-Lo

g(e

rro

r)

10 1̂

10 2̂

10 3̂

10 4̂

10 5̂

Learning error

Learned probabilities

Page 15: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 15

Fast On-Line LearningFast On-Line Learning

• At the principal MIN node, only the best moves for every opponent type are needed

• Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities.

• Drift to one opponent type is possible.

Page 16: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 16

Slower On-Line LearningSlower On-Line LearningNaive Bayesian Naive Bayesian (Duda & Hart’73)(Duda & Hart’73)

• Compute the value of every move at the principal MIN node for every opponent type

• Transform these values into conditional probabilities P(move | opp).

• Compute P(opp | moveobs) using P*(opp) (Bayes rule)

• take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs)

Page 17: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 17

Naïve Bayesian LearningNaïve Bayesian Learning• In the end, drifting to 1-0 probabilities will

occur almost always• Parameter a is very important for the actual

performance: – amount of change in the probabilities– convergence– drifting speed

• It should be tuned in a real setting

Page 18: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 18

ConclusionsConclusions

• Effective off-line learning of probabilities is possible, when ambiguous events are disregarded.

• Off-line learning also works if the opponent does not use a mixed strategy of known opponent types.

• On-line learning must be tuned precisely to a given situation

Page 19: Learning Opponent-type Probabilities for PrOM search

August 20 2001 6th Computer Olympiad 19

Future ResearchFuture Research

• PrOM search and learning in real game playing

– Zanzibar Bao (8x4 mancala)– LOA (some experiment with OM-search done)

– Chess endgames