machine learning in computer game players

45
Machine Learning in Computer Game Players Chikayama & Taura Lab. M1 Ayato Miki 1

Upload: dewey

Post on 25-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning in Computer Game Players

Machine Learning in Computer Game

PlayersChikayama & Taura Lab.

M1 Ayato Miki

1

Page 2: Machine Learning in Computer Game Players

1. Introduction2. Computer Game Players3. Machine Learning in Computer Game

Players4. Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

5. Conclusion

Outline

2

Page 3: Machine Learning in Computer Game Players

Improvements in Computer Game Players◦ DEEP BLUE defeated Kasparov in 1997◦ GEKISASHI and TANASE SHOGI on WCSC 2008

Strong Computer Game Players are usually developed by strong human players◦ Input heuristics manually◦ Devote a lot of time and energy to tuning

1. Introduction

3

Page 4: Machine Learning in Computer Game Players

Machine Learning enables automatic tuning using a large amount of data

It is not necessary for a developer to be an expert of the game

Machine Learning for Games

4

Page 5: Machine Learning in Computer Game Players

1. Introduction2. Computer Game Players3. Machine Learning in Computer Game

Players4. Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

5. Conclusion

Outline

5

Page 6: Machine Learning in Computer Game Players

Games

Game Trees

Game Tree Search

Evaluation Function

2. Computer Game Players

6

Page 7: Machine Learning in Computer Game Players

Turn system games◦ ex. tic-tac-toe, chess, shogi, poker, mah-jong…

Additional Classification◦ two player or otherwise◦ zero-sum or otherwise◦ deterministic or non-deterministic◦ perfect or imperfect information

Game Tree Model

Games

7

Page 8: Machine Learning in Computer Game Players

Game Trees

8

← player’s turn

← move 2move 1 →

← opponent’s turn

Page 9: Machine Learning in Computer Game Players

ex. Minimax search algorithm

Game Tree Search

9

5

5 8 3 6

5 3

3 51 4 28 3 10 6 24

Max

Max Max

Min Min

Page 10: Machine Learning in Computer Game Players

Difficult to search up to leaf nodes◦ 10^220 possible positions in shogi

Stop search at practicable depth And “Evaluate” nodes

◦ Using Evaluation Function

10

Game Tree Search

Page 11: Machine Learning in Computer Game Players

Estimate the superiority of the position

Elements◦ feature vector of the position◦ parameter vector

Evaluation Function

),()( sfsV

s feature vector of position

sparameter vector

11

Page 12: Machine Learning in Computer Game Players

Introduction Computer Game Players Machine Learning in Computer Game

Players Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

Conclusion

Outline

12

Page 13: Machine Learning in Computer Game Players

Initial work◦ Samuel’s research [1959]

Learning objective◦ What do Computer Game Players Learn ?

3. Machine Learning inComputer Game Players

13

Page 14: Machine Learning in Computer Game Players

Many useful techniques◦ Rote learning◦ Quiescence search◦ 3-layer neural network evaluation function

And some machine learning techniques◦ Learning through self-play◦ Temporal-difference learning◦ Comparison training

Samuel’s Checker Player [1959]

14

Page 15: Machine Learning in Computer Game Players

Opening Book

Search Control

Evaluation Function

15

Learning Objective

Page 16: Machine Learning in Computer Game Players

Automatic construction of evaluation function◦ Construct and select a feature vector

automatically◦ ex. GLEM [Buro, 1998]◦ Difficult

Tuning evaluation function parameters◦ Make a feature vector manually and tune its

parameters automatically◦ Easy and effective

Learning Evaluation Functions

18

Page 17: Machine Learning in Computer Game Players

Introduction Computer Game Players Machine Learning in Computer Game

Players Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

Conclusion

Outline

19

Page 18: Machine Learning in Computer Game Players

Supervised Learning

Reinforcement Learning

Evolutionary Algorithm

4. Tuning Evaluation Functions

20

Page 19: Machine Learning in Computer Game Players

Provide the program with example positions and their exact evaluation values

Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values

Supervised Learning

2050

)(sVerror

・・・

1040 50

21

Page 20: Machine Learning in Computer Game Players

Manual labeling positions

Quantitative evaluation

22

Difficulty of Hard Supervised Training

Consider more soft approach

Page 21: Machine Learning in Computer Game Players

Soft Supervised Training

Require only relative order for the possible moves◦ Easier and more intuitive

Comparison Training

>

23

Page 22: Machine Learning in Computer Game Players

Comparison training using records of expert games

Simple relative order

Bonanza [Hoki, 2006]

The expert move other moves>

24

Page 23: Machine Learning in Computer Game Players

Based on the Optimal Control Theory Minimize the Cost Function J

Bonanza Method

1

0110 ),(),,,,(

N

iiN slsssJ

),(

i

i

sl

Ns example positions in the

records

error functiontotal number of example positions

25

Page 24: Machine Learning in Computer Game Players

Bonanza Method

1

10 )],'(),'([),(

M

mmm ssTsl

Error Function

)(),'(

0

'

xTs

mMs

m

m

child position with move mtotal number of possible movesthe move played in the recordminimax search valueorder discriminant function

26

Page 25: Machine Learning in Computer Game Players

Sigmoid Function

◦ k is the parameter to control the gradient◦ When , T(x) is Step Function◦ In this case, the error function means “the

number of moves that were considered to be better than the move in the record”

Order Discriminant Function

kxexT

11)(

k

27

Page 26: Machine Learning in Computer Game Players

30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used

The weight parameters of about 10,000 feature elements were tuned

And won in the World Computer Shogi Championship 2006

Bonanza

29

Page 27: Machine Learning in Computer Game Players

It is costly to accumulate a training data set◦ It takes a lot of time to label manually◦ Using expert records has been successful

But how if not enough expert records ?◦ New games◦ Minor games

Problem of Supervised Learning

30

Other approach without a training set◦ ex. Reinforcement Learning (Next)

Page 28: Machine Learning in Computer Game Players

Supervised Learning

Reinforcement Learning

Evolutionary Algorithm

4. Tuning Evaluation Functions

31

Page 29: Machine Learning in Computer Game Players

The learner gets “a reward” from the environment

In the domain of game, the reward is final outcome(win/lose)

Reinforcement learning requires only the objective information of the game

Reinforcement Learning

32

Page 30: Machine Learning in Computer Game Players

33

Reinforcement Learning

+100

+60

+30

+10

+200

+120

+60

+20

-100

-60

-30

-10

Inefficient in Games…

Page 31: Machine Learning in Computer Game Players

34

Temporal-Difference Learning

+100

+60

+30

+10

+80

+15

+10

+10

)()( 1 tt sVsVrTDerror

Page 32: Machine Learning in Computer Game Players

Trained through self-play

TD-Gammon [Tesauro, 1992]

35

Version Features Strength

TD-Gammon 0.0 Raw Board Information

Top of Computer Players

TD-Gammon 1.0 Plus Additional Heuristics

World-championship

Page 33: Machine Learning in Computer Game Players

Falling into a local optimum◦ Lack of playing variation

Solutions◦ Add intentional randomness◦ Play against various players (computer/human)

Credit Assignment Problem (CAP)◦ Not clear which action was effective

Problems of Reinforcement Learning

36

Page 34: Machine Learning in Computer Game Players

Supervised Learning

Reinforcement Learning

Evolutionary Algorithm

4. Tuning Evaluation Functions

37

Page 35: Machine Learning in Computer Game Players

Evolutionary AlgorithmInitialize Population

Randomly Vary Individuals

Evaluate “Fitness”

Apply Selection

38

Page 36: Machine Learning in Computer Game Players

Evolutionary algorithm for chess player

Using open-source chess program◦ Attempt to tune its parameters

Research of Fogel et al. [2004]

39

Page 37: Machine Learning in Computer Game Players

Make initial 10 parents◦ Initialize parameters with random values

Initialization

40

Page 38: Machine Learning in Computer Game Players

Create 10 offsprings from each surviving parent by mutating parental parameters

Variation

)',0(' iii sN

isN'

),( Gaussian random variablestrategy parameter

41

Page 39: Machine Learning in Computer Game Players

Each player plays ten games against randomly selected opponents

Ten best players become parents of the next generation

Evaluate Fitness and Selection

42

Select 10 opponents randomly

Page 40: Machine Learning in Computer Game Players

Material value

Positional value

Weights and biases of three neural networks

43

Tuned Parameters

Page 41: Machine Learning in Computer Game Players

Each network has 3 Layers Input = Arrangement of specific areas(front 2 rows, back 2 rows, and center 4x4 square) Hidden = 10 Units Output = Worth of the area arrangement

44

Three Neural Networks

16 input 10 hidden 1 output

Page 42: Machine Learning in Computer Game Players

Initial Rating = 2066 (Expert)◦ Rating of open-source player

Best Rating = 2437 (Senior Master)

But the program cannot yet compete with other strongest chess programs (R2800~)

Result

45

10 independent trials (Each has 50 generations)

Page 43: Machine Learning in Computer Game Players

Introduction Computer Game Players Machine Learning in Computer Game

Players Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

Conclusion

Outline

47

Page 44: Machine Learning in Computer Game Players

Advantages Disadvantages

Supervised Learning Direct and Effective Manual Labeling Cost

Reinforcement Learning Wide Application Local Optimal

CAP

Evolutionary Algorithm

Wide ApplicationNo CAP

IndirectRandom Dispersion

Characteristics

48

Page 45: Machine Learning in Computer Game Players

Automatic position labeling◦ Using records or computer play

Sophisticated reward◦ Consider opponent’s strength◦ Move analysis for credit assignment

Experiment in other games

Future Work

49