combining games artificial intelligences & improving random seeds

Download Combining games artificial intelligences & improving random seeds

If you can't read please download the document

Upload: olivier-teytaud

Post on 16-Apr-2017

467 views

Category:

Engineering


0 download

TRANSCRIPT

Portfolios of Artificial Intelligences
+ playing with random seeds

1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments

J.-B. Hoock, D. L. St-Pierre, O. Teytaud

Portfolio

I have K algorithms for solving a given task:Mcts

Alpha-Beta

Parametric script

Nested MC

I want to choose the best one

Two frameworks

OfflineI do some work before the competition

I combine all my algorithms into 1

Simple version:Compute some probability vector p

For each game, use Algo(i) with probability p(i)

OnlineFor each game, Use Algo(i) with probability p(i)

Update p when the game is over

1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments

Offline Nash portfolio

K algorithms for black BAI(1),..., BAI(K)

K' algorithms for white WAI(1),...,WAI(K')

Def: Mij=proba( BAI(i) beats WAI(j) )

Define (p,q) = Nash equilibrium of Mp = best stochastic portfolio for Black (Nash sense)

q = best stochastic portfolio for White (Nash sense)

Portfolio:Black: Play BAI(i) with probability p(i)

White: Play WAI(j) with probability q(j)

Other offline portfolios

K algorithms for black BAI(1),..., BAI(K)

K' algorithms for white WAI(1),...,WAI(K)

Definitions:Uniform portfolio: p(i) = 1/K q(j)=1/K'

Fixed seed: p(i)=1, q(j)=1 for some i,j

Best arm: fixed seed with i best row / j best column

Portfolio:Black: Play BAI(i) with probability p(i)

White: Play WAI(j) with probability q(j)

1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments

Online portfolio (for Black)

Just apply UCBT (or your favorite bandit)

Before playing a game:p(i) = frequency of win for BAI(i)

n(i) =number of times BAI(i) was used

N= sum of the n(i)

sc(i)= p(i) + Clog(N)/n(i) +C' sqrt( p(i)(1-p(i)) log(N) /n(i) )

choose i* maximizing sc(i*)

Play with BAI(i*)

1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments

Nash

Computed exactly in polynomial time.

with precision e in expected time O( (K+K') log (K+K') 2 / e 2 )

The best portfolio in terms of

Worst case winrate against the WAI(i)

Worst case winrate against WAI(i) for i ~ some probability distribution

UCBT for Black

Nearly zero computational overhead

Asymptotically optimal winning rate against a stationary opponent, among the BAI(i)

We did not try discounted Ucb

1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experimentson 9x9 Go

First portfolio: random seeds

Pick up a stochastic algorithm

Choose K random seeds

You get K algorithms

Hint: the random seed has a significant impact.Yes, it's by rote learning (kind of opening book).

Performance of Nash portfolio
(learnt offline), in generalization

Againstnew seeds

Vs uniform==> this means we outperform the default version(which is randomized seeds).

Portfolios are herea distribution on random seeds.

We get an improved algorithm(winning rate 66%) justwith that.

Performance of Nash portfolio
(learnt offline), in generalization

Againstnew seeds

Vs uniform:==> this means we outperform the default version(which is randomized seeds)

Portfolios are herea distribution on random seeds.

We get an improved algorithm(winning rate 66%) justwith that.

X-axis = K = K'

Remarks

Nash portfolio good

Best Arm seed very good

But we will see that best arm has weaknesses ==> it can be overfitted i.e.easily beaten by a learning opponent.

UCBT cruches fixedSeed and wins against uniform

X-axis = log2 (nb of games)(max. 512 games)Dots decreasingto 0.

Fixed seeds (deterministicalgorithms)are overfittedafter 64 games

UCBT cruches fixedSeed and wins against uniform

Dots decreasingto 0.

Fixed seeds (deterministicalgorithms)are overfittedafter 64 games

X-axis = log2 (nb of games)(max. 512 games)

Other experiments: variants of some algorithm

Gnugo with options (32 variants)

Nash-portfolio or UCBT portfolio: only a few percents of improvements over a single ad hoc variant.

==> less impressive than with random seeds

Conclusions

Nice applicationfor Nash-portfolio:Choose a stochastic algorithm

Build a matrix M of games randomSeed vs randomSeed

Compute the Nash equilibrium

You get a new probability distribution on random seeds

It should be strong than the original algorithm.

Nice application for UCBT-portfolioPlay against it

As long as you lose, it will keep the same line of play

Conclusions

Further workBetter Nash approximation

Increase fun (should Ucbt explore more or less? discount ?)

Bigger experiments (bigger games ? 19x19?)

Comments? We forgot to cite your paper? We did not try on your favorite game?Our results are bullshit? Please tell us:-)