disappointing results & open problems in monte-carlo tree search

Download Disappointing results & open problems in Monte-Carlo Tree Search

If you can't read please download the document

Upload: olivier-teytaud

Post on 16-Apr-2017

830 views

Category:

Technology


0 download

TRANSCRIPT

Disappointing and UnexpectedResults in Monte-Carlo Tree Search

O. Teytaud & colleaguesSilver Workshop, ECML 2012In a nutshell:- the game of Go, a great AI-complete challenge- MCTS, a great recent tool for MDP-solving- negative results on MCTS are the most important stuff- considerations on academic publications (pros and cons)

Disappointing and UnexpectedResults in Monte-Carlo Tree Search

O. Teytaud & colleaguesSilver Workshop, ECML 2012In a nutshell:- the game of Go, a great AI-complete challenge- MCTS, a great recent tool for MDP-solving- negative results on MCTS are the most important stuff- considerations on academic publications (pros and cons)If you solve these weaknesses,even if it takes all yourtime in all your research during 30years, it is worth being done.

Part I. A success story on Computer Games

Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)

Part I : The Success Story(less showing off in part II :-) )The game of Go is a beautifulChallenge.

Part I : The Success Story(less showing off in part II :-) )

The game of Go is a beautifulchallenge.

We did the first wins againstprofessional playersin the game of Go

Game of Go (9x9 here)

Game of Go

Game of Go

Game of Go

Game of Go

Game of Go

Game of Go

Game of Go: counting territories
(white has 7.5 bonus as black starts)

Game of Go: the rules

Black plays at the blue circle: the white group dies (it is removed)

It's impossible to kill white (two eyes).

Superko rule: we don't come back to the same situation.

(without superko: PSPACE hard with superko: EXPTIME-hard)

At the end, we count territories==> black starts, so +7.5 for white.

Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)UCT (Upper Confidence Trees)
(a variant of MCTS)

UCT

UCT

UCT

UCT

UCT

Kocsis & Szepesvari (06)

Exploitation ...

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

... or exploration ?

SCORE = 0/2 + k.sqrt( log(10)/2 )

UCB ?

I have shown the UCB formula (Lai, Robbins), which is the difference between MCTS and UCT

UCB ?

I have shown the UCB formula (Lai, Robbins), which is the difference between MCTS and UCT

The UCB formula has deep mathematical principles.

UCB ?

I have shown the UCB formula (Lai, Robbins), which is the difference between MCTS and UCT

The UCB formula has deep mathematical principles.

But very far from the MCTS context.

UCB ?

I have shown the UCB formula (Lai, Robbins), which is the difference between MCTS and UCT

The UCB formula has deep mathematical principles.

But very far from the MCTS context.

Contrarily to what has often been claimed, UCB is not central in MCTS.

UCB ?

I have shown the UCB formula (Lai, Robbins), which is the difference between MCTS and UCT

The UCB formula has deep mathematical principles.

But very far from the MCTS context.

Contrarily to what has often been claimed, UCB is not central in MCTS.

But for publishing papers, relating MCTS to UCB is so beautiful, with plenty of maths papers in the bibliography :-)

The great news:

Not related to classical algorithms(no alpha-beta)

Recent tools (Rmi Coulom's paper in 2006)

Not at all specific from Go(now widely used in games,and beyond)

The great news:

Not related to classical algorithms(no alpha-beta)

Recent tools (Rmi Coulom's paper in 2006)

Not at all specific from Go(now widely used in games,and beyond)

But great performance in Goneeds adaptations(of the MC part)...

We all have to write reports:

Showing that we are very strong

Showing that our research has breakthroughs,which destroy bottlenecks

So ok the previous slide is perfect for that

Part II: challenges

Two main challenges: Situations which require abstract thinking (cf. Cazenave)

Situations which involve divide & conquer (cf Mller)

Part I. A success story on Computer Games

Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)

A trivial semeai
(= liberty race)

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

Semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

A trivial semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

A trivial semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

A trivial semeai

Plenty of equivalentsituations!

They are randomlysampled, with no generalization.

50% of estimatedwin probability!

This is very easy.Children can solve that.

But it is too abstractfor computers.

Computers playsemeais very badly.

It does not work. Why ?

50% of estimatedwin probability!

In the first node: The first simulations give ~ 50%

The next simulations go to 100% or 0% (depending on the chosen move)

But, then, we switch to another node

(~ 8! x 8! such nodes)

And the humans ?

50% of estimatedwin probability!

In the first node: The first simulations give ~ 50%

The next simulations go to 100% or 0% (depending on the chosen move)

But, then, we DON'T switch to another node

Requires more than local fighting.Requires combining several local fights.Children usually not so good at this.But strong adults really good.And computers very childish.

Looks like abad move,locally.Lee Sedol (black)VsHang Jansik (white)

Requires more than local fighting.Requires combining several local fights.Children usually not so good at this.But strong adults really good.And computers very childish.

Looks like abad move,locally.

Part I. A success story on Computer Games

Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them (negatives results show that importance stuff is really on II...)

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)

Part III: techniques for addressing these challenges1. Parallelization

2. Machine Learning

3. Genetic Programming

4. Nested MCTS

Parallelizing MCTS

On a parallel machine with shared memory: just many simulations in parallel, the same memory for all.

On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second:Select nodes with at least 5% of total sims (depth at most 3)

Average all statistics on these nodes==> comp cost = log(nb comp nodes)

Parallelizing MCTS

On a parallel machine with shared memory: just many simulations in parallel, the same memory for all.

On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second:Select nodes with at least 5% of total sims (depth at most 3)

Average all statistics on these nodes==> comp cost = log(nb comp nodes)

Parallelizing MCTS

On a parallel machine with shared memory: just many simulations in parallel, the same memory for all.

On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second:Select nodes with at least 5% of total sims (depth at most 3)

Average all statistics on these nodes==> comp cost = log(nb comp nodes)

Parallelizing MCTS

On a parallel machine with shared memory: just many simulations in parallel, the same memory for all.

On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second:Select nodes with at least 5% of total sims (depth at most 3)

Average all statistics on these nodes==> comp cost = log(nb comp nodes)

Parallelizing MCTS

On a parallel machine with shared memory: just many simulations in parallel, the same memory for all.

On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second:Select nodes with at least 5% of total sims (depth at most 3)

Average all statistics on these nodes==> comp cost = log(nb comp nodes)

Good news: it works

So misleading numbers...

Much better than voting schemes

But little difference with T. Cazenave (depth 0).

Every month, someone tells us:

Try with a biggermachine !And win againsttop pros !(I have believed that,at some point...)

In fact, 32 and 1
have almost the same level...
(against humans...)

Being faster is not the solution

The same in Havannah
(F. Teytaud)

More deeply, 1
(R. Coulom)

Improvement in terms of performance against humans significantly (statistics) ok, but far less efficient than human expertise

Part III: techniques for adressing these challenges1. Parallelization

2. Machine Learning

3. Genetic Programming

4. Nested MCTS

We don't want to use expert knowledge.We want automated solutions.Developing biases by Genetic Programming ?

We don't want to use expert knowledge.We want automated solutions.Developing a MC by Genetic Programming ?Looks like a good idea.

But importantly:

A strong MC part(in terms of playing strength of the MC part),does not imply (by far!)a stronger MCTS.

(except in 1P cases...)

We don't want to use expert knowledge.We want automated solutions.Developing a MC by Genetic Programming ?

Hoock et alCazenave et al

Part III: techniques for addressing these challenges1. Parallelization

2. Machine Learning

3. Genetic Programming

4. Nested MCTS

Nested MCTS in one slide(Cazenave, F. Teytaud, etc)1) to a strategy, you can associate a value function

-Value(s) = expected reward when simulation with strategy from state s

Nested MCTS in one slide(Cazenave, F. Teytaud, etc)1) to a strategy, you can associate a value function

-Value(s) = expected reward when simulation with strategy from state s

2) Then define: Nested-MC0(state)=MC(state) Nested-MC1(state)=decision maximizing NestedMC0-value(state.(state))... Nested-MC.42(state)=decision maximizing NestedMC.41-value(state.(state))

Nested MCTS in one slide(Cazenave, F. Teytaud, etc)1) to a strategy, you can associate a value function

-Value(s) = expected reward when simulation with strategy from state s

2) Then define: NestedMC0(state)=MC(state) NestedMC1(state)=decision maximizing NestedMC0-value(state+decision)... NestedMC.42(state)=decision maximizing NestedMC.41-value(state+decision)

==> looks like a great idea==> not good in Go==> good on some less widely known testbeds (morpion solitaire, some hard scheduling pbs)

Part I. A success story on Computer Games

Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)

Part IV: ConclusionsGame of Go:

1- disappointingly, most recent progress = human expertise ==> we understood a lot by methods which do not work or work little ==> we understood a lot by counter-examples, not by impressive performance

Part IV: ConclusionsGame of Go:

1- disappointingly, most recent progress = human expertise

2- UCB is not that much involved in MCTS (simple rules perform similarly) ==> publication bias

Part IV: ConclusionsRecent generic progress in MCTS:

1- application to GGP (general game playing): the program learns the rules of the game just before the competition, no last-minute development (fully automatized) ==> not so well known, but really interesting

Part IV: ConclusionsRecent generic progress in MCTS:

1- application to GGP (general game playing): the program learns the rules of the game just before the competition, no last-minute Development (fully automatized)

2- one-player games: great ideas which do not work in 2P-games sometimes work in 1P games (e.g. optimizing the MC in aDPS sense)

Part IV: Conclusions Techniques whichoutperformed thestate of the art inMinesweeper were(negatively) tested on Go,and (positively) onindustrial problems.

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthrough.

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthroughs.But when you discuss with them they tellyou that there is publication andthere is reality.

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthroughs.But when you discuss with them they tellyou that there is publication andthere is reality.

At the end, we trust our friends, or publishedtheorems, but we don't trust experiments.

The most interesting MCTS results arenegative results:

Current main MLtechniques for MCTSdoes not work on this

Abstractthinking (lookslike theoremproving)

Understanding this combination of local stuffis impossible for computers

There are several examples of MCTS papersin which problems were swept under the carpet,for the sake of publication,whereas the dust was the interesting stuff.

Results are often difficult to reproduce,or unstable w.r.t. experimental conditions.

Examples:

- I have truncated results to ..... because it was unstable otherwise.(cheat by using new version only for openings)

==> for any method, with enough tuning, you get positive results

Examples:

- I have truncated results to ..... because it was unstable otherwise.(cheat by using new version only for openings)

- I could make it work after a lot of tuning in 9x9, but I couldnot get positive results in 19x19(cheat by heavy tuning)==> for any method, with enough tuning, you get positive results ==> you are more likely to publish I used sophisticated method XXX and got positive results than I used plenty of dirty tuningand got positive results==> if method XXX has plenty of free parametersit's ok at some point you will validate it

For mathematical works,sometimes people lie on motivations, tryingto justify that there is a real world application.

For mathematical works,sometimes people lie on motivations, tryingto justify that there is a real world application.

Sometimes it's true, but it's also often a lie.

A memory from a long time ago;I was working on pure theory stuff and I asked I have read in the abstract thatthis can be applied to biological problems.Can you explain ?

For mathematical works,sometimes people lie on motivations, tryingto justify that there is a real world application.

Sometimes it's true, but it's also often a lie.

A memory from a long time ago;I was working on pure theory stuff and I asked I have read in the abstract thatthis can be applied to biological problems.

Answer: Wahaha he has believed it!

For mathematical works,sometimes people lie on motivations, tryingto justify that there is a real world application.

Sometimes it's true, but it's also often a lie.

In experiments, it's different: people often use experimental setupsfor hiding the problems under the carpet.Mathematicians can not do that.

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthroughs.But when you discuss with them they tellyou that there is publication andthere is reality.

My conclusions:- don't trust publications too much, - I want to publish less- I want to publish (try to publish...) failures and disappointing results.

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthrough.But when you discuss with them they tellyou that there is publication andthere is reality.

My conclusions:- don't trust publications too much, - I want to publish less- I want to publish (try to publish...) failures and disappointing results.We could apply inMineSweeper (1P)Ideas which do notwork in Go (2P)

Part V: Meta-ConclusionHuge publication bias.People report only experiments which aresooooo great breakthrough.But when you discuss with them they tellyou that there is publication andthere is reality.

My conclusions:- don't trust publications too much, - I want to publish less- I want to publish (try to publish...) failures and disappointing results.We could apply inMineSweeper (1P)Ideas which do notwork in Go (2P)We could apply inEnergy Manag. (1P)Ideas which do notwork in Go (2P)

Part V: Meta-ConclusionPeople in computer-games look much moreClever since they have been working on Go.

Much easier to write reports :-)Lucky, right place, right moment.

The progress in the game of Go does not cure cancer.The important challenges are still in front of us (don't trusttoo much published solutions...).

Failed experiments on Go provide more insights than the success story (in which the tuning part, which is not so generic, is not visible...).

Yet games are great challenges.When you play Go, you look clever & wise.

When you play StarCraft, you look like a geeky teenager.

Yet, StarCraft, Doom, Table Tennis, MineSweeper are great challenges.

Difficult games: Havannah

Very difficult for computers.

What else ? First Person Shooting
(UCT for partially observable MDP)

What else ? Real Time Strategy Game
(multiple actors, partially obs.)

What else ? Sports (continuous control)

Real games

Assumption: if a computer understands and guesses spins, thenthis robot will be efficient for something else than just games.

(holds true for Go)

Real games

Assumption: if a computer understands and guesses spins, thenthis robot will be efficient for something else than just games.

VS

What else ? Collaborative sports

Experimentalworks.Difficult to reproduce(except games...)Dust swept undercarpet / aestheticbiasStatisticalconscious or unconsciouscheatingNegative resultsunpublished

Moderately reliablepublication

Yet, academic papers are, I think, more reliable thanreports for billion-$ contracts ==> pressure by moneydoes not work :-(Funding based onpublication recordsFor me, this is sourceof all evil.

Academics are, andshould remain,the most independent and reliable people.

Should be refereesfor all importantindustrial contracts.

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level