machine learning in computer game players

Machine Learning in Computer Game

PlayersChikayama & Taura Lab.

M1 Ayato Miki

1. Introduction2. Computer Game Players3. Machine Learning in Computer Game

Players4. Tuning Evaluation Functions

◦ Supervised Learning◦ Reinforcement Learning◦ Evolutionary Algorithms

5. Conclusion

Outline

Improvements in Computer Game Players◦ DEEP BLUE defeated Kasparov in 1997◦ GEKISASHI and TANASE SHOGI on WCSC 2008

Strong Computer Game Players are usually developed by strong human players◦ Input heuristics manually◦ Devote a lot of time and energy to tuning

1. Introduction

Machine Learning enables automatic tuning using a large amount of data

It is not necessary for a developer to be an expert of the game

Machine Learning for Games

1. Introduction2. Computer Game Players3. Machine Learning in Computer Game

Players4. Tuning Evaluation Functions

5. Conclusion

Outline

Game Trees

Game Tree Search

Evaluation Function

2. Computer Game Players

Turn system games◦ ex. tic-tac-toe, chess, shogi, poker, mah-jong…

Additional Classification◦ two player or otherwise◦ zero-sum or otherwise◦ deterministic or non-deterministic◦ perfect or imperfect information

Game Tree Model

Game Trees

← player’s turn

← move 2move 1 →

← opponent’s turn

ex. Minimax search algorithm

Game Tree Search

5 8 3 6

3 51 4 28 3 10 6 24

Max Max

Min Min

Difficult to search up to leaf nodes◦ 10^220 possible positions in shogi

Stop search at practicable depth And “Evaluate” nodes

◦ Using Evaluation Function

Game Tree Search

Estimate the superiority of the position

Elements◦ feature vector of the position◦ parameter vector

Evaluation Function

),()( sfsV

s feature vector of position

sparameter vector

Introduction Computer Game Players Machine Learning in Computer Game

Players Tuning Evaluation Functions

Conclusion

Outline

Initial work◦ Samuel’s research [1959]

Learning objective◦ What do Computer Game Players Learn ?

3. Machine Learning inComputer Game Players

Many useful techniques◦ Rote learning◦ Quiescence search◦ 3-layer neural network evaluation function

And some machine learning techniques◦ Learning through self-play◦ Temporal-difference learning◦ Comparison training

Samuel’s Checker Player [1959]

Opening Book

Search Control

Evaluation Function

Learning Objective

Automatic construction of evaluation function◦ Construct and select a feature vector

automatically◦ ex. GLEM [Buro, 1998]◦ Difficult

Tuning evaluation function parameters◦ Make a feature vector manually and tune its

parameters automatically◦ Easy and effective

Learning Evaluation Functions

Conclusion

Outline

Supervised Learning

Reinforcement Learning

Evolutionary Algorithm

4. Tuning Evaluation Functions

Provide the program with example positions and their exact evaluation values

Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values

Supervised Learning

)(sVerror

・・・

1040 50

Manual labeling positions

Quantitative evaluation

Difficulty of Hard Supervised Training

Consider more soft approach

Soft Supervised Training

Require only relative order for the possible moves◦ Easier and more intuitive

Comparison Training

Comparison training using records of expert games

Simple relative order

Bonanza [Hoki, 2006]

The expert move other moves>

Based on the Optimal Control Theory Minimize the Cost Function J

Bonanza Method

0110 ),(),,,,(

iiN slsssJ

Ns example positions in the

records

error functiontotal number of example positions

Bonanza Method

10 )],'(),'([),(

mmm ssTsl

Error Function

)(),'(

child position with move mtotal number of possible movesthe move played in the recordminimax search valueorder discriminant function

Sigmoid Function

◦ k is the parameter to control the gradient◦ When , T(x) is Step Function◦ In this case, the error function means “the

number of moves that were considered to be better than the move in the record”

Order Discriminant Function

30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used

The weight parameters of about 10,000 feature elements were tuned

And won in the World Computer Shogi Championship 2006

Bonanza

It is costly to accumulate a training data set◦ It takes a lot of time to label manually◦ Using expert records has been successful

But how if not enough expert records ?◦ New games◦ Minor games

Problem of Supervised Learning

Other approach without a training set◦ ex. Reinforcement Learning (Next)

Supervised Learning

The learner gets “a reward” from the environment

In the domain of game, the reward is final outcome(win/lose)

Reinforcement learning requires only the objective information of the game

Inefficient in Games…

Temporal-Difference Learning

)()( 1 tt sVsVrTDerror

Trained through self-play

TD-Gammon [Tesauro, 1992]

Version Features Strength

TD-Gammon 0.0 Raw Board Information

Top of Computer Players

TD-Gammon 1.0 Plus Additional Heuristics

World-championship

Falling into a local optimum◦ Lack of playing variation

Solutions◦ Add intentional randomness◦ Play against various players (computer/human)

Credit Assignment Problem (CAP)◦ Not clear which action was effective

Problems of Reinforcement Learning

Supervised Learning

Evolutionary AlgorithmInitialize Population

Randomly Vary Individuals

Evaluate “Fitness”

Apply Selection

Evolutionary algorithm for chess player

Using open-source chess program◦ Attempt to tune its parameters

Research of Fogel et al. [2004]

Make initial 10 parents◦ Initialize parameters with random values

Initialization

Create 10 offsprings from each surviving parent by mutating parental parameters

Variation

)',0(' iii sN

),( Gaussian random variablestrategy parameter

Each player plays ten games against randomly selected opponents

Ten best players become parents of the next generation

Evaluate Fitness and Selection

Select 10 opponents randomly

Material value

Positional value

Weights and biases of three neural networks

Tuned Parameters

Each network has 3 Layers Input = Arrangement of specific areas(front 2 rows, back 2 rows, and center 4x4 square) Hidden = 10 Units Output = Worth of the area arrangement

Three Neural Networks

16 input 10 hidden 1 output

Initial Rating = 2066 (Expert)◦ Rating of open-source player

Best Rating = 2437 (Senior Master)

But the program cannot yet compete with other strongest chess programs (R2800~)

Result

10 independent trials (Each has 50 generations)

Conclusion

Outline

Advantages Disadvantages

Supervised Learning Direct and Effective Manual Labeling Cost

Reinforcement Learning Wide Application Local Optimal

Wide ApplicationNo CAP

IndirectRandom Dispersion

Characteristics

Automatic position labeling◦ Using records or computer play

Sophisticated reward◦ Consider opponent’s strength◦ Move analysis for credit assignment

Experiment in other games

Future Work

machine learning in computer game players

strong computer game

learning objectivewhat

gamemachine learning

introduction3machine

shallow search values

deeper search valueex

feature vector automaticallyex

opening book16use

Documents

what do serious game players think?

formal elements - players. players games designed for...

game stages govern interactions in arcade...

football history and its game players

preventing in-‐game injuries for nba players

american football. the field, time of game, and players

number of players / game mode game setup - · pdf filenumber...

new players new game

visual searching abilities in video game players and …

managment as game design: people are players not pieces

1-2 organisations and players in the game

female video game players: a different type of player?

game demo. game statistics 2 players : alex felix

tactical training - amazon web services · 2017. 11....

mojtahed-zadeh, pirouz - small players of the great game

predicting players’ performance in the game of...

aaron curley. game overview multiplayer. team-deathmatch...

game analytics & machine learning

the players club board game 1

management as game design people are players not pieces...