artificial intelligence in game design

Artificial Intelligence in Game Design

Off-Line and Neural Network Learning in Games

Type of Learning in Games

• On-Line learning– Takes place during game

– Changes game parameters for this particular play of game

– Must be fast and efficient

– Simple hill climbing, N-Grams, ID3 decision tree learning

• Off-Line learning– Done during development stage of game

– Used to set game parameters for final release of game

– Can use complex forms of learning

– Neural networks, Reinforcement learning, Bayesian learning, etc.

Game Parameters

• Most complex games use continuous-valued parameters of some sort in decision making– Probabilities/Fuzzy measures

– Coefficients in MinMax heuristics

– How do we know what the best values are?

Confident Angry Frightened

Attack Left 40% 60% 30%

Attack Right 40% 35% 20%

Defend 20% 5% 50%

9 5 3 3 1

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Chess_qlt45.svg/45px-Chess_qlt45.svg.png&imgrefurl=http://en.wikipedia.org/wiki/Chess_piece&usg=__dlpB2T_QyQ1Elf4LY7tN0PBUwBI=&h=45&w=45&sz=3&hl=en&start=1&tbnid=dkwST-dSnEUeFM:&tbnh=45&tbnw=45&prev=/images%3Fq%3Dsite:upload.wikimedia.org%2Bchess%2Bpieces%26imgsz%3Dicon%26gbv%3D2%26hl%3Den

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Chess_plt45.svg/45px-Chess_plt45.svg.png&imgrefurl=http://en.wikipedia.org/wiki/Chess_piece&usg=__uUBoYLeBtGfANP_sLVA06S4jv08=&h=45&w=45&sz=1&hl=en&start=2&tbnid=GmHgqqYBlxrK0M:&tbnh=45&tbnw=45&prev=/images%3Fq%3Dsite:upload.wikimedia.org%2Bchess%2Bpieces%26imgsz%3Dicon%26gbv%3D2%26hl%3Den

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/7/75/Chess_tile_bl.png/40px-Chess_tile_bl.png&imgrefurl=http://en.wikibooks.org/wiki/Talk:Chess&usg=__AOLEewpZu4YCjXdrLIVbS4aHI5M=&h=40&w=40&sz=2&hl=en&start=62&tbnid=Z__nTApzXVDWkM:&tbnh=40&tbnw=40&prev=/images%3Fq%3Dsite:upload.wikimedia.org%2Bchess%2Bpieces%26start%3D54%26imgsz%3Dicon%26gbv%3D2%26ndsp%3D18%26hl%3Den%26sa%3DN

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Chess_nlt45.svg/45px-Chess_nlt45.svg.png&imgrefurl=http://en.wikipedia.org/wiki/Rook_(chess)&usg=__7vOz5gJ43ZO9XbGe0Ud-mqCak58=&h=45&w=45&sz=2&hl=en&start=18&tbnid=YYR9wselP8flJM:&tbnh=45&tbnw=45&prev=/images%3Fq%3Dsite:upload.wikimedia.org%2Bchess%2Bknight%26imgsz%3Dicon%26gbv%3D2%26hl%3Den

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Chess_tile_rl.png/40px-Chess_tile_rl.png&imgrefurl=http://en.wikibooks.org/wiki/Puzzles/Chess_problems/Chess_piece_placement_puzzles/32_knight_puzzle/Solution&usg=__ji90U8vLtYLbMBunPVh8vM2r-Do=&h=40&w=40&sz=2&hl=en&start=11&tbnid=QDeBUQtgsXR6cM:&tbnh=40&tbnw=40&prev=/images%3Fq%3Dsite:upload.wikimedia.org%2Bchess%2Bknight%26imgsz%3Dicon%26gbv%3D2%26hl%3Den

Learning Game Parameters

• Learn game parameters from examples– Database of environmental inputs and desired character actions

Current Parameters

Actions indicated by rules

Degree of error between

actual and desired action

Learning ElementDetermines how to change parameters in order to decrease error

Inputs Desired Action

Example Example

… …


• Main difficulty: Acquiring examples of desired behavior to learn from– Most learning algorithms require thousands of examples

– Must know desired action for each one

• One solution: Let customers provide examples– Neural network learning in “twenty questions”

– http://www.20q.net/

Server contains database of examples submitted by users

Users interact with system on line

Learning ElementAdjusts knowledge to reduce error between its actions and user input

“Console version” marketed with final

rules learned

http://www.20q.net/


• Can learn by repeatedly playing game against itself• Samuel’s checkers program

– Early application of MinMax

– Used up to 15 different weight values for heuristic evaluation of board

• Basic idea:– Multiple versions of weights generated

– All versions played against one another

– Weights with most wins considered best

Artificial Neural Networks

• Based on “structure” of brain– Only known working model of intelligence

– Note that ANN is very rough approximation of little we know about brain

• Main components:– Neural Units

• Approximation of “neurons”

• Roughly binary states– “on” or “off”, “1” or “0”– “active” or “inactive”

– Connections• Approximation of “synapses”

• Connect 2 neural units

Unit

Connection

Unit

Artificial Neural Networks

• Connections have weights– Positive or negative numbers

– Positive weight causes unit A to activate unit B

– Negative weight causes unit A to deactivate unit B

• Example: “mouse behavior”

Unit A

Connection

Unit BWeight W

Smell cheese

Run towards smell

Positive weight

Smell cat Negative weight

Network Layers

• Input units– Correspond to sensory

input (such as “smell cheese”)

• Intermediate units– Used for complex

interactions

• Output units– Correspond to actions

taken (such as “run towards smell”)

Human brain contains billions of neuronsEach connected to between 1000 – 10000 other units via synapses

Perceptron Networks

• Early model (1960’s)• Very simple representation and

learning– Fast and easy to implement

– Too simple for some domains

• Single layer of inputs, outputs– All units binary (value 0 or 1)

– No intermediate units

– Input and output layers interconnected with each other

• Example: simple orc

Hit points

Have weapon

Player strength

Run

Attack

http://images.google.com/imgres?imgurl=http://www.tontonxx.info/ggll/class_replays/img/w3xp/orc.gif&imgrefurl=http://www.tontonxx.info/ggll/detail_replay.php%3Fw3g_path%3Dreplay//%26id%3DUn3xp3cTiDvsGeththechobo&h=50&w=50&sz=2&hl=en&start=11&usg=__XR5p9fpJJKZjYRKOZk-55sV9fHo=&tbnid=CxujY1BVRkK_bM:&tbnh=50&tbnw=50&prev=/images%3Fq%3Dorc%26imgsz%3Dicon%26gbv%3D2%26hl%3Den

Net Input Value

• Step 1: Compute net input from inputs S1 – Sn to output Sj

– Based on activation of input units S1 – Sn (0 or 1)

– Based on weights Wij from inputs Si to Sj

• netj = Σ Si Wij i

Sn

Si

S1

Sj

W1j

Wij

Wnj

…

…

Ideas:

Inactive units (with value 0) should have no effect on output

Effect of active units should be proportional to the weight between that input and output

Activation of Outputs

• Threshold activation

– netj > threshold for unit j Sj = 1

– netj ≤ threshold for unit j Sj = 0

• Often represent threshold as another weight Wbias

– Negative weight from unit which is always active

– Makes it easier to learn thresholdSbias (always 1)

Sn

S1

Sj

W1j

Wnj

Wbias j

…

Sj

netjthreshold

Problem Format

• Neural networks expect input activations in range 0 to 1• May need to normalize data to get this• Example: orc

– Hit points between 1 and 100 divide hit points by 100

– Have weapon: 1 if yes, 0 if no

– Player strength between 10 and 20 subtract 10 and divide by 10

Hit points = 50

Have weapon = yes

Player strength = 17.5

(0.5, 1.0, 0.75)

Orc Example

• netrun = (-1.5) + (-2) + 3 + (-2) = -2.5 Srun = 0

• netattack = 2 + 5 + (-2.25) + (-3) = 1.75 Sattack = 1

Hit points = 0.5

Have weapon = 1

Player strength = .75

Run

Attack

-3

4

-2

5

Bias = 1

4

-3

-2-3

Perceptron Learning

• Key question: Where do weights come from?– Too complex to hand code

– Must learn from examples

• Sample training set– Input values normalized between 0 and 1

Inputs Desired Outputs

ExampleHit

PointsHas

WeaponPlayer

Strength Attack Run

1 0.6 1 0.8 1 0

2 0.7 0 0.4 1 0

3 0.4 0 0.3 0 1

4 0.8 0 0.7 0 1

5 0.3 1 0.4 1 0

6 0.2 1 0.6 0 1

Perceptron Learning

• Given the following:

– Sip : State of input i for example p

– Sjp : Actual state of output j for example p

– tjp : Desired state of output j for example p

• How should the weight Wij be changed to decrease the error between Sjp and tjp ?

SipSjp

Wij

Perceptron Learning

Key ideas:

• If Sjp correct (Sjp == tjp), then no change to Wij – Not broken, so don’t fix!

• If Sjp == 0 and tjp == 1, then netjp is too low

– Increase Wij

• If Sjp == 1 and tjp == 0, then netjp is too high

– Decrease Wij

• If input value Sip == 0, then Wij had no effect on netjp

– Only change Wij by the magnitude of Sip

Perceptron Learning

• Perceptron learning rule:

Wij = Wij + η Sip (tjp - Sjp )

• η = step size– Should be small (usually around 0.1)

Perceptron Learning

• Learning involves cycling through training set many times– Often thousands of cycles

• Start with randomly determined set of weights– Usually between -1 and 1

• While error left for some example pFor example p = 1 to number of examples

For all inputs i Assign input i value Sip

For all outputs jCompute actual output value Sjp

For all weights Wij Apply learning rule to update weight

Limits of Perceptron Learning

• Perceptron learning proven to correctly learn any behavior that can be represented with single layer of weights

– Problem: Not all behaviors can be represented this way

– Example: Exclusive-or A B

A B bias A B

1 1 1 0

1 0 1 1

0 1 1 1

0 0 1 0bias

B

AWA

WB

Wbias

WA + WB + Wbias > 0 WA + Wbias < 0 WB + Wbias < 0 Wbias > 0

Contradiction

Limits of Perceptron Learning

• Networks with at least two layers of weights proven to represent any behavior defined by some logical proposition

• Problem: perceptron learning does not work– No “desired value” for intermediate units– Learning rule undefined without known values for units connected by weight

Back Propagation Learning

• Most successful algorithm for multilayered neural networks• Based on gradient descent

– Derivative of error with respect to change in weight

• Ep = Σ (Skp - tkp )2

all outputs k

• Δ W = ∂ (Ep) / ∂ W

• Requires activation function with continuous derivative– Threshold function has infinite derivative at threshold

– Often use sigmoidal activation function 1/(1 + e-netj)• More biologically plausible anyway

Sj

netj

Back Propagation Learning

Equations:

• Weights Wjk between intermediate and output units:

Δ Wjk = η (Skp - tkp ) Skp (1 - Skp) Sjp

• Weights Wij between input and intermediate units:

Δ Wij = η Σ(Skp - tkp ) Skp (1 - Skp) Sjp Wjk Sjp (1 - Sjp) Sip

k

– Summed because Wjk affects all ouputs

Sip

Sjp

Wij Skp

Wjk

Neural Networks and Games

• Perceptron learning often sufficient for game development– Only fails for nonlinear behaviors like XOR

– Types of problem where multiple “good” things not good• High NPC hit points attack• Low player strength attack• High NPC hit points and low player strength run?

– Rare in practice

• Works well even if results not perfect• Nonlinear aspects to behavior

• Occasional bad data in training set

– Still creates weights that work most of time• Advantage of most neural network algorithms


What is being learned?• Relative importance of different inputs on player actions• Thresholds at which different values should affect behavior

– Extremely hard to create a good set of rules by hand if large number of inputs and possible actions

Hit points

Have weapon

Player strength

“Is my hit points or player strength more important in deciding

when to attack?

“How low should I let my hit points get

before running away?


• Can also use learning to create variety of characters– Provide variety of experiences for players– Makes characters more realistic

• Train each with slightly different set of examples– Much like how we learn from different experiences

Inputs Desired Outputs

ExampleHit

PointsHas

WeaponPlayer

Strength Attack Run

1 0.4 1 0.9 1 0

2 0.6 0 0.5 1 0

3 0.2 0 0.5 0 1

4 0.4 0 0.9 0 1

5 0.3 1 0.4 1 0

6 0.1 1 0.8 0 1

“I don’t know the meaning

of fear (or many other

words)!

Example: Dirt Track Racing

• Neural network used to train cars to take corners– Inputs: Track conditions (degree of mud, gravel, snow, etc.)– Outputs: Location at which to begin steering, angle, speed, etc.– Supervision: Positive reinforcement if made it through turn

quickly without crashing

artificial intelligence in game design

Documents