learning opponent-type probabilities for prom search

August 20 2001 6th Computer Olympiad 1

Learning Opponent-type Learning Opponent-type Probabilities for PrOM searchProbabilities for PrOM search

Jeroen Donkers

IKAT Universiteit Maastricht


ContentsContents

• OM search and PrOM search

• Learning for PrOM search

• Off-line Learning

• On-line Learning

• Conclusions & Future research


OM searchOM search

– MAX player uses evaluation function V0

– Opponent uses different evaluation function (Vop)

– At MIN nodes: predict which move the opponent will select (using standard search and Vop)

– At MAX nodes, pick the move that maximizes the search value (based on V0)

– At leaf nodes: use V0


PrOM searchPrOM search

• Extended Opponent Model:– a set of opponent types (e.g. evaluation functions)– a probability distribution over this set

• Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type.


PrOM search algorithmPrOM search algorithm

• At MIN nodes: determine for every opponent type which move would be selected.

• Compute the MAX player’s value for these moves

• Use opponent-type probabilities to compute the expected value of the MIN node

• at MAX nodes: select maximum child


Learning in PrOM searchLearning in PrOM search

• How do we assess the probabilities on the opponent types?– Off line: use games previously played by the

opponent, to estimate the probabilities. (lot of time and - possibly - data available)

– On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed.


Off-Line LearningOff-Line Learning

• Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent.

• Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best.



• How to obtain P*(opp)?• Input: a set of positions and the moves that

the given opponent and all the given opponent types would select

• “Algorithm”: P*(oppi) = Ni / N• But: leave out all ambiguous positions!

(e.g. when more than one opponent type agree with the opponent)



• Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types– Effective learning is possible (P*(opp) P# (opp))– More difficult if the opponent types are not

independent


5 opponent typesP = (a,b,b,b,b)20 moves 100 - 100,000 runs100 samples

Not leaving outambiguous events


5 opponent typesP = (a,b,b,b,b)20 moves 10 - 100,000 runs100 samples

Leaving outambiguous events


2-20 opponent typesP = (a,b,b,b,b)20 moves 100,000 runs100 samples

Varying number of opponent types



• Case 2: The opponent is using a different strategy.– Opponent types behave random but dependent

(distribution of type i depends on type i-1)– Real opponent selects a fixed move


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 2 4 6 8 10 12 14 16 18

Opponent's selection

opp4

opp3

opp2

opp1

opp0

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14 16 18

Opponent's selection

-Lo

g(e

rro

r)

10 1̂

10 2̂

10 3̂

10 4̂

10 5̂

Learning error

Learned probabilities


Fast On-Line LearningFast On-Line Learning

• At the principal MIN node, only the best moves for every opponent type are needed

• Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities.

• Drift to one opponent type is possible.


Slower On-Line LearningSlower On-Line LearningNaive Bayesian Naive Bayesian (Duda & Hart’73)(Duda & Hart’73)

• Compute the value of every move at the principal MIN node for every opponent type

• Transform these values into conditional probabilities P(move | opp).

• Compute P(opp | moveobs) using P*(opp) (Bayes rule)

• take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs)


Naïve Bayesian LearningNaïve Bayesian Learning• In the end, drifting to 1-0 probabilities will

occur almost always• Parameter a is very important for the actual

performance: – amount of change in the probabilities– convergence– drifting speed

• It should be tuned in a real setting


ConclusionsConclusions

• Effective off-line learning of probabilities is possible, when ambiguous events are disregarded.

• Off-line learning also works if the opponent does not use a mixed strategy of known opponent types.

• On-line learning must be tuned precisely to a given situation


Future ResearchFuture Research

• PrOM search and learning in real game playing

– Zanzibar Bao (8x4 mancala)– LOA (some experiment with OM-search done)

– Chess endgames

learning opponent-type probabilities for prom search

Documents